构建可用于商业用途的生成式人工智能系统
Building Business-Ready Generative AI Systems
利用代理、记忆和LLM构建以人为中心的企业级生成式人工智能系统
Build Human-Centered Generative AI Systems with Agents, Memory, and LLMs for Enterprise
构建可用于商业用途的生成式人工智能系统
Building Business-Ready Generative AI Systems
版权所有 © 2025 Packt Publishing
Copyright © 2025 Packt Publishing
版权所有。未经出版商事先书面许可,不得以任何形式或任何方式复制、存储于检索系统或传播本书的任何部分,但嵌入评论文章或评论中的简短引文除外。
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
本书在编写过程中已尽一切努力确保所提供信息的准确性。然而,本书所含信息均不提供任何明示或暗示的担保。作者、Packt Publishing及其经销商和分销商均不对因本书直接或间接造成的任何损失承担责任。
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing已尽力通过正确使用大写字母提供本书中提及的所有公司和产品的商标信息。然而,Packt Publishing无法保证此信息的准确性。
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
投资组合总监:格宾·乔治
Portfolio Director: Gebin George
客户关系负责人:阿里·阿比迪
Relationship Lead: Ali Abidi
项目经理: Prajakta Naik
Project Manager: Prajakta Naik
内容工程师: Tanya D'cruz
Content Engineer: Tanya D’cruz
技术编辑: Rahul Limbachiya
Technical Editor: Rahul Limbachiya
文案编辑: Safis Editing
Copy Editor: Safis Editing
索引员:赫曼吉尼·巴里
Indexer: Hemangini Bari
校对员: Tanya D'cruz
Proofreader: Tanya D’cruz
美术指导:沙塔努·扎加德
Production Designer: Shantanu Zagade
增长主管:尼米莎·杜阿
Growth Lead: Nimisha Dua
营销负责人:Dipali Malwatkar
Marketing Owner: Dipali Malwatkar
首次发布:2025年7月
First published: July 2025
生产参考编号:1220725
Production reference: 1220725
由 Packt Publishing Ltd. 出版。
Published by Packt Publishing Ltd.
格罗夫纳酒店
Grosvenor House
圣保罗广场11号
11 St Paul’s Square
伯明翰
Birmingham
英国 B3 1RB。
B3 1RB, UK.
ISBN 978-1-83702-069-0
ISBN 978-1-83702-069-0
谨以此书献给我的家人和朋友,他们是我快乐的源泉。
I would like to dedicate this book to my family and friends who are my source of happiness.
丹尼斯·罗斯曼
– Denis Rothman
毕业于索邦大学和巴黎狄德罗大学。他是首批获得专利的词嵌入矩阵算法和人工智能对话代理的先驱之一。在其职业生涯早期,丹尼斯开发了一款认知型自然语言处理聊天机器人,该机器人被酩悦香槟和其他全球品牌采用,用作自动化语言训练工具。之后,他为空客(前身为法国宇航公司)创建了一款人工智能资源优化器,该优化器被IBM和服装行业的领先企业所采用。他开发的先进计划与排程(APS)解决方案已在全球范围内广泛应用,并影响了各行业的供应链智能。
is a graduate of Sorbonne University and Paris-Diderot University. He pioneered one of the first patented word2matrix embedding algorithms and AI-powered conversational agents. Early in his career, Denis developed a cognitive NLP chatbot adopted by Moët & Chandon and other global brands as an automated language trainer. He later created an AI resource optimizer for Airbus (formerly Aérospatiale), which was implemented by IBM and leading companies in the apparel industry. His Advanced Planning and Scheduling (APS) solution, used worldwide, has shaped supply chain intelligence across sectors.
丹尼斯通过他的著作,与致力于塑造有目的性的人工智能的全球思想家、建设者和学习者群体分享了他的创新经验。
Through his books, Denis shares his experience of innovation with a global community of thinkers, builders, and learners committed to shaping AI with purpose.
是一位经验丰富的软件工程师,近十年来一直致力于解决航空航天和医疗技术等领域的复杂挑战。作为菲奇堡州立大学计算机科学系的讲师和教授,Martin 对 OpenAI 的 API 套件非常熟悉,并擅长构建、训练和改进实用的人工智能系统。他也是一位知名的作家,致力于让更多人了解人工智能开发。
is an experienced software engineer who has spent nearly a decade tackling complex challenges in fields like aerospace and medical technology. As an instructor and professor of computer science at Fitchburg State University, Martin is deeply familiar with OpenAI’s suite of APIs and excels at building, training, and refining practical AI systems. He is also a recognized author who enjoys making AI development accessible to others.
是一位机器学习工程师,在自然语言处理领域拥有丰富的经验。他曾在多家初创公司工作,负责开发人工智能应用、模型和生产系统。他是 LangChain 的积极贡献者,尤其关注模型量化和人工智能代理架构。
is a machine learning engineer with extensive experience in natural language processing. He has worked at several startups, creating AI applications, models, and production systems. He is an active contributor to LangChain, with a particular interest in model quantization and AI agent architectures.
毕业于卡内基梅隆大学 (CMU) 语言技术研究所 (LTI)。她目前在摩根大通担任高级人工智能/机器学习应用专员,就职于其机器学习卓越中心 (MLCOE)。她曾在多个活动和会议上发表演讲,包括 WiDS、PyData、SciPy、Illuminate AI 和 TensorFlow 用户组。此外,她还积极通过与 Anita Borg 和 Women in Coding and Data Science (WiCDS) 合作的导师项目,为职业生涯早期的专业人士提供支持。
is a Carnegie Mellon University (CMU) alumna from the Language Technologies Institute (LTI). She currently works as a Senior AI/ML Applied Associate at JPMorgan Chase, within their specialized Machine Learning Center of Excellence (MLCOE). She has delivered several talks at events and conferences, including WiDS, PyData, SciPy, Illuminate AI, and the TensorFlow User Group. She also actively supports early-career professionals through mentorship programs with Anita Borg and Women in Coding and Data Science (WiCDS).
新框架、演进的架构、研究成果发布、生产故障——AI_Distilled每周都会为从事 LLM 和 GenAI 系统实战的工程师和研究人员提供一份简报,帮您过滤掉各种干扰信息。立即订阅,即可免费获得电子书,以及每周更新的深度见解,助您保持专注,掌握最新资讯。
New frameworks, evolving architectures, research drops, production breakdowns—AI_Distilled filters the noise into a weekly briefing for engineers and researchers working hands-on with LLMs and GenAI systems. Subscribe now and receive a free eBook, along with weekly insights that help you stay focused and informed.
订阅请访问https://packt.link/TRO5B或扫描下方二维码。
Subscribe at https://packt.link/TRO5B or scan the QR code below.
在快速发展的人工智能领域,仅靠独立的大型语言模型(LLM)无法创造商业价值。这本全面的指南将帮助您成为一名生成式人工智能架构师,从零开始构建功能强大的 ChatGPT 级系统,并使其适应任何平台。您将超越基础聊天机器人,开发能够进行语义和情感分析的系统——其核心是上下文感知人工智能控制器。
In the rapidly evolving AI landscape, standalone large language models (LLMs) alone don’t deliver business value. This comprehensive guide enables you to become a generative AI architect, building powerful ChatGPT-grade systems from scratch that are adaptable to any platform. You’ll go beyond basic chatbots, developing systems capable of semantic and sentiment analysis—with context-aware AI controllers at their core.
你将完整地学习如何设计具有多用户记忆保持功能的AI控制器架构,使系统能够适应各种用户和系统输入。在此基础上,你将使用Pinecone构建一个动态检索增强生成(RAG)系统,该系统旨在智能地组合指令场景。
You’ll walk through the complete journey of designing an AI controller architecture with multi-user memory retention, enabling the system to adapt to diverse user and system inputs. Building on this foundation, you’ll architect a dynamic retrieval-augmented generation (RAG) system using Pinecone, designed to intelligently combine instruction scenarios.
凭借强大的多模态功能——包括图像生成、语音交互和机器驱动推理——您将能够利用思维链编排扩展系统的智能,从而应对真实业务环境中跨领域的自动化挑战。此外,您还将能够轻松地将 OpenAI 的 LLM 和 DeepSeek-R1 等前沿模型集成到您的 AI 系统中,并确信即使在当今瞬息万变的 AI 生态系统中,系统也能保持稳定、可扩展且随时可用于生产环境。
With powerful multimodal capabilities—including image generation, voice interactions, and machine-driven reasoning—you’ll expand your system’s intelligence using chain-of-thought orchestration to tackle the cross-domain automation challenges that real business environments demand. You’ll also be fully equipped to integrate cutting-edge models such as OpenAI’s LLMs and DeepSeek-R1 into your AI system—confident that it will remain stable, scalable, and production-ready even in the turbulent pace of today’s AI ecosystem.
最终,您的生成式人工智能系统(GenAISys)将能够进行轨迹分析和人类移动预测,即使数据不完整也能胜任。它将为您的营销信息带来神经科学启发式的洞察,无缝集成到人类工作流程中,可视化复杂的交付和移动场景,并连接到实时外部数据——所有这些都集成在一个精美且面向投资者的界面中。
By the end, your generative AI system (GenAISys) will be capable of trajectory analysis and human mobility prediction, even when working with incomplete data. It will bring neuroscience-inspired insight to your marketing messages, integrate seamlessly into human workflows, visualize complex delivery and movement scenarios, and connect to live external data—all wrapped in a polished, investor-ready interface.
本书面向人工智能和机器学习工程师、软件架构师以及企业开发人员,旨在帮助他们从零开始构建一个全面的企业级人工智能系统(GenAISys)。本书尤其适合那些对构建人工智能代理、创建高级编排系统以及利用人工智能实现市场营销、生产和物流自动化感兴趣的人士。此外,希望构建可扩展的人工智能驱动系统的软件架构师和企业开发人员也能从本书中获益匪浅。本书不要求读者具备任何超级智能方面的经验,但建议读者熟悉人工智能概念。
This book is for AI and machine learning engineers, software architects, and enterprise developers seeking to architect and build a comprehensive GenAISys from scratch for enterprise applications. It will particularly benefit those interested in building AI agents, creating advanced orchestration systems, and leveraging AI for automation in marketing, production, and logistics. Software architects and enterprise developers looking to build scalable AI-driven systems will also find immense value in this guide. No prior superintelligence experience is necessary, but familiarity with AI concepts is recommended.
第一章“定义可用于业务的生成式人工智能系统”阐述了如何设计强大且达到 ChatGPT 级别的、专为商业用途量身定制的人工智能系统,该系统融合了人工智能控制器、代理、情境感知、高级记忆保持和战略性的人机协作。您将探索如何超越基本的模型集成,构建能够跨行业带来实际业务成果的人工智能解决方案,并成为一名能够适应任何环境的架构师。
Chapter 1, Defining a Business-Ready Generative AI System, explains how to design robust, ChatGPT-level AI systems tailored for business use, incorporating AI controllers, agents, contextual awareness, advanced memory retention, and strategic human collaboration. You will explore how to go beyond basic model integrations to build AI solutions that deliver real-world business results across industries and become an architect who can adapt to any environment.
第二章“构建生成式AI控制器”深入探讨了自适应AI控制器的实践开发,将对话代理和编排器相结合。您将熟练掌握如何构建一个能够动态响应用户输入、利用高级内存管理并编排情感和语义分析等任务的GenAISys系统,这些任务对于现实世界的数据密集型应用至关重要。
Chapter 2, Building the Generative AI Controller, dives into hands-on development of adaptive AI controllers, combining conversational agents and orchestrators. You’ll become proficient in building a GenAISys that dynamically responds to user input, leverages advanced memory management, and orchestrates tasks such as sentiment and semantic analysis, which are essential for real-world, data-intensive applications.
第三章“将动态 RAG 集成到 GenAISys 中”将教您如何使用 Pinecone 构建可扩展的动态 RAG 系统,有效地结合教学场景和传统数据。您将学习如何构建适应性强的生成式 AI 解决方案,从而能够快速响应现实世界的业务中断和特定领域的挑战。
Chapter 3, Integrating Dynamic RAG into the GenAISys, teaches you how to architect a scalable, dynamic RAG system using Pinecone, effectively combining instruction scenarios and classical data. You will discover how to build adaptable generative AI solutions capable of responding rapidly to real-world business disruptions and domain-specific challenges.
第四章“构建AI控制器编排接口”展示了如何构建一个协作式、事件驱动的生成式AI接口,旨在快速响应实际业务需求。我们将创建一个实用的多用户对话式AI,它可以无缝集成到人类工作流程中,并在关键场景下增强决策能力。
Chapter 4, Building the AI Controller Orchestration Interface, shows how to build a collaborative, event-driven generative AI interface designed for rapid, real-world business responses. We’ll create a practical, multi-user conversational AI that integrates seamlessly into human workflows, enhancing decision-making during critical scenarios.
第五章“利用思维链实现多模态、多功能推理”利用 OpenAI 强大的多模态功能(包括图像生成、语音交互和机器驱动推理)增强了我们的 GenAISys 系统。我们将通过思维链编排扩展系统的智能,从而实现针对复杂跨领域业务场景的智能自动化。
Chapter 5, Adding Multimodal, Multifunctional Reasoning with Chain of Thought, enhances our GenAISys with OpenAI’s powerful multimodal capabilities, including image generation, voice interactions, and machine-driven reasoning. We’ll expand our system’s intelligence using chain-of-thought orchestration—enabling smart automation tailored to complex, cross-domain business scenarios.
第六章“推理型电子营销人工智能代理”利用人工智能驱动的记忆分析功能,打造令消费者印象深刻的营销信息。我们将构建一个消费者记忆代理,它能够智能分析评论和情绪,生成由多模态推理增强的个性化营销内容——将神经科学的洞见直接引入我们的GenAISys系统。
Chapter 6, Reasoning E-Marketing AI Agents, harnesses the power of AI-driven memory analysis to create marketing messages customers remember. We’ll build a consumer memory agent that intelligently analyzes reviews and emotions, producing personalized marketing content enhanced by multimodal reasoning—bringing neuroscience insights directly into our GenAISys.
第七章“利用 DeepSeek 增强 GenAISys”探讨了如何在不牺牲稳定性的前提下应对人工智能的快速发展。您将学习如何利用灵活的处理器选择机制,将 DeepSeek-R1 等前沿模型战略性地集成到 GenAISys 中,从而使我们的系统能够在快速变化的市场中保持适应性、可扩展性和生产就绪性。
Chapter 7, Enhancing the GenAISys with DeepSeek, navigates the rapid evolution of AI without compromising stability. You will learn how to strategically integrate cutting-edge models such as DeepSeek-R1 into our GenAISys, using a flexible handler selection mechanism—enabling our system to remain adaptable, scalable, and production-ready in a fast-changing market.
第八章“ GenAISys轨迹模拟与预测”将先进的AI驱动轨迹分析引入GenAISys,即使在数据不完整的情况下也能精准预测人员移动和货物配送。我们将构建一个直观的流程,利用LLM(线性线性模型)模拟、分析和可视化复杂的配送和移动场景,将空间数据转化为可执行的洞察。
Chapter 8, GenAISys for Trajectory Simulation and Prediction, brings advanced AI-driven trajectory analysis into our GenAISys to precisely predict human mobility and deliveries, even with incomplete data. We’ll build an intuitive pipeline that leverages LLMs to simulate, analyze, and visualize complex delivery and movement scenarios, transforming spatial data into actionable insights.
第九章“利用数据安全和审核功能升级 GenAISys 以提升客户服务”介绍了如何通过集成实时天气信息、旅行信息、强大的审核工具和高级安全功能,将 GenAISys 安全地连接到真实世界数据。您将了解如何安全地将人工智能的功能扩展到内部数据之外,从而实现个性化营销、动态活动推荐和实时客户互动。
Chapter 9, Upgrading the GenAISys with Data Security and Moderation for Customer Service, connects our GenAISys securely to real-world data by integrating live weather information, travel information, robust moderation tools, and advanced security features. You will discover how to safely extend our AI’s capabilities beyond internal data, enabling personalized marketing, dynamic activity recommendations, and real-time customer interactions.
第十章“展示您的商业化生成式人工智能系统”将向您展示如何将您的 GenAISys 概念验证转化为引人入胜、可供投资者参考的展示方案。我们将学习如何通过清晰的信息传递、实用的演示和完善的用户界面有效地展示我们的人工智能系统,从而吸引观众的注意力,展现其价值,并证明其已做好在竞争激烈的人工智能市场中扩展规模的准备。
Chapter 10, Presenting Your Business-Ready Generative AI System, will show you how to turn your GenAISys proof of concept into a compelling, investor-ready showcase. We’ll learn how to effectively present our AI system with clear messaging, practical demos, and a polished user interface, capturing audience attention, demonstrating value, and proving readiness to scale in a crowded AI marketplace.
您无需成为人工智能专家即可从本指南中受益,只需熟悉人工智能和编程的基础知识即可。如果您使用过 Python 或探索过 ChatGPT 等语言学习模型 (LLM),您会感觉非常熟悉。
You don’t need to be an AI expert to benefit from this guide, just some familiarity with the basics of artificial intelligence and programming. If you’ve worked with Python or explored LLMs like ChatGPT, you’ll feel right at home.
本书以清晰易懂、循序渐进的方式介绍了人工智能代理编排、记忆保留和红绿灯等概念。无论您是开发人员、工程师还是对技术感兴趣的人士,都能找到旨在支持实际应用的工具、代码和解释。
This book introduces concepts like AI agent orchestration, memory retention, and RAG in clear, hands-on steps. Whether you’re a developer, engineer, or curious technologist, you’ll find tools, code, and explanations designed to support real-world applications.
不需要深厚的数学或理论背景,只需要渴望构建能够解决业务问题并在生产环境中扩展的有意义的人工智能系统。
No deep math or theoretical background is required, only a desire to build meaningful AI systems that solve business problems and scale in production environments.
代码可在 Google Colab 笔记本中使用,并可自动安装,或者您可以下载代码到本地。
The code is available in Google Colab notebooks with automated installations or you can download the code locally.
本书的代码包托管在 GitHub 上,地址为https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems。我们丰富的图书和视频库中还有其他代码包,地址为https://github.com/PacktPublishing。欢迎查看!
The code bundle for the book is hosted on GitHub at https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing. Check them out!
我们还提供了一个 PDF 文件,其中包含本书中使用的屏幕截图/图表的彩色图像。您可以从这里下载:https://packt.link/gbp/9781837020690。
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781837020690.
本书中使用了多种文本惯例。
There are a number of text conventions used throughout this book.
CodeInText:用于指示文本中的代码词、数据库表名、文件夹名、文件名、文件扩展名、路径名、虚拟 URL、用户输入以及 Twitter/X 用户名。例如:“接下来,我们将整合instruct_selector到现有界面布局中(VBox)。”
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter/X handles. For example: “Next, we incorporate instruct_selector into the existing interface layout (VBox).”
一段代码块的设置如下:
A block of code is set as follows:
# Ensure 'Instructions' exists in the memory_selector options
instruct_selector = Dropdown(
options=["None","Analysis", "Generation"],
value="None", # Ensure default active_memory is in the options
description='Reasoning:',
layout=Layout(width='50%')
)
任何命令行输入或输出都按以下格式编写:
Any command-line input or output is written as follows:
Response: The dialog begins by explaining the formation of Hawaii's volcanic islands as the Pacific Plate moves over a stationary hotspot, leading to active volcanoes like Kilauea….
粗体:表示屏幕上出现的新术语、重要词语或单词。例如,菜单或对话框中的文字会以这种方式显示。例如:“用户随后重新输入句子,但这次勾选了‘生成’和‘文件’选项,以便显示用该文本生成的图像。”
Bold: Indicates a new term, an important word, or words that you see on the screen. For instance, words in menus or dialog boxes appear in the text like this. For example: “The user then reenters the sentence, but this time with the Generation option and the Files option checked, so that the image generated with the text will be displayed.”
警告或重要提示会显示如下内容。
Warnings or important notes appear like this.
技巧和窍门如下所示。
Tips and tricks appear like this.
我们非常欢迎读者的反馈意见。
Feedback from our readers is always welcome.
一般反馈:如果您对本书的任何方面有疑问或有任何一般性反馈,请发送电子邮件至[电子邮件地址]customercare@packt.com 并在邮件主题中提及本书的标题。
General feedback: If you have questions about any aspect of this book or have any general feedback, please email us at customercare@packt.com and mention the book’s title in the subject of your message.
勘误:尽管我们已尽一切努力确保内容的准确性,但错误在所难免。如果您在本书中发现任何错误,请告知我们,我们将不胜感激。请访问http://www.packt.com/submit-errata,点击“提交勘误”,并填写表格。
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you reported this to us. Please visit http://www.packt.com/submit-errata, click Submit Errata, and fill in the form.
盗版:如果您在互联网上发现任何形式的非法复制品,请告知我们其位置地址或网站名称。请copyright@packt.com通过以下方式联系我们,并提供相关链接。
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.
如果您有兴趣成为一名作家:如果您在某个主题方面拥有专业知识,并且有兴趣撰写或参与撰写书籍,请访问http://authors.packt.com/。
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packt.com/.
读完《构建可用于商业用途的生成式人工智能系统》后,我们非常希望听到您的想法!请点击此处直接前往本书的亚马逊评论页面,分享您的反馈。
Once you’ve read Building Business-Ready Generative AI Systems, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
您的评价对我们和科技界都非常重要,它将帮助我们确保提供高质量的内容。
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
你并非唯一一个在应对碎片化工具、频繁更新和不明确的最佳实践的人。加入我们不断壮大的专业人士社区,交流那些未被写入文档的真知灼见。
You’re not the only one navigating fragmented tools, constant updates, and unclear best practices. Join a growing community of professionals exchanging insights that don’t make it into documentation.
|
随时关注我们作者的最新动态、讨论和幕后花絮。加入我们的 Discord 服务器:https://packt.link/z8ivB或扫描下方二维码: Stay informed with updates, discussions, and behind-the-scenes insights from our authors. Join our Discord at https://packt.link/z8ivB or scan the QR code below: |
与同行交流,分享想法,探讨现实世界中 GenAI 面临的挑战。请在 Reddit 上关注我们:https: //packt.link/0rExL或扫描下方二维码: Connect with peers, share ideas, and discuss real-world GenAI challenges. Follow us on Reddit at https://packt.link/0rExL or scan the QR code below: |
立即解锁本书的专属权益Unlock this book’s exclusive benefits now扫描此二维码或访问packtpub.com/unlock,然后按书名搜索本书。请确保版本正确。 Scan this QR code or go to packtpub.com/unlock, then search this book by name. Ensure it’s the correct edition. |
|
|
注意:开始之前请准备好您的采购发票。 Note: Keep your purchase invoice ready before you start. |
使用我们的新一代阅读器,获得更佳的阅读体验:
Enhanced reading experience with our Next-gen Reader:
多设备进度同步:通过无缝进度同步,从任何设备学习。
Multi-device progress sync: Learn from any device with seamless progress sync.
重点标记和笔记:将阅读转化为持久的知识。
Highlighting and notetaking: Turn your reading into lasting knowledge.
书签功能:随时回顾您最重要的学习内容。
Bookmarking: Revisit your most important learnings anytime.
深色模式:切换到深色或棕褐色模式,以最小的眼睛疲劳集中注意力。
Dark mode: Focus with minimal eye strain by switching to dark or sepia mode.
使用我们的人工智能助手(测试版)更智能地学习:
Learn smarter using our AI assistant (Beta):
概括:概括关键部分或整个章节。
Summarize it: Summarize key sections or an entire chapter.
AI 代码解释器:在新一代 Packt Reader 中,点击每个代码块上方的“解释”按钮,即可获得 AI 驱动的代码解释。
AI code explainers: In the next-gen Packt Reader, click the Explain button above each code block for AI-powered code explanations.
注:人工智能助手是下一代 Packt Reader 的一部分,目前仍处于测试阶段。
Note: The AI assistant is part of next-gen Packt Reader and is still in beta.
随时随地学习:
Learn anytime, anywhere:
使用无 DRM 保护的 PDF 和 ePub 版本离线访问您的内容——与您喜爱的电子阅读器兼容。
Access your content offline with DRM-free PDF and ePub versions—compatible with your favorite e-readers.
您购买本书即可享受以下专属权益:
Your copy of this book comes with the following exclusive benefits:
新一代 Packt 阅读器
Next-gen Packt Reader
人工智能助手(测试版)
AI assistant (beta)
无DRM保护的PDF/ePub下载
DRM-free PDF/ePub downloads
如果您尚未解锁,请按照以下指南进行解锁。此过程只需几分钟,而且只需操作一次。
Use the following guide to unlock them if you haven’t already. The process takes just a few minutes and needs to be done only once.
请准备好本书的购买发票,因为您将在步骤 3中用到它。如果您收到的是纸质发票,请用手机扫描并保存为 PDF、JPG 或 PNG 格式。
Keep your purchase invoice for this book ready, as you’ll need it in Step 3. If you received a physical invoice, scan it on your phone and have it ready as either a PDF, JPG, or PNG.
如需查找发票方面的更多帮助,请访问https://www.packtpub.com/unlock-benefits/help
For more help on finding your invoice, visit https://www.packtpub.com/unlock-benefits/help
注意:您是直接从 Packt 购买的这本书吗?如果是,则无需发票。完成第二步后,即可直接访问您的专属内容。
Note: Did you buy this book directly from Packt? You don’t need an invoice. After completing Step 2, you can jump straight to your exclusive content.
步骤 2Step 2扫描此二维码或访问packtpub.com/unlock。 Scan this QR code or go to packtpub.com/unlock. |
|
|
在打开的页面(如果您使用的是电脑,页面界面将类似于图 0.1)上,按书名搜索本书。请务必选择正确的版本。 On the page that opens (which will look similar to Figure 0.1 if you’re on desktop), search for this book by name. Make sure you select the correct edition. 图 0.1:Packt 解锁桌面版登录页面 Figure 0.1: Packt unlock landing page on desktop |
|
登录您的 Packt 账户或免费创建一个新账户。登录后,上传您的发票。发票格式可以是 PDF、PNG 或 JPG,大小不得超过 10 MB。按照屏幕上的其余说明完成操作。
Sign in to your Packt account or create a new one for free. Once you’re logged in, upload your invoice. It can be in PDF, PNG, or JPG format and must be no larger than 10 MB. Follow the rest of the instructions on the screen to complete the process.
需要帮助吗?Need help?如果您遇到问题需要帮助,请访问https://www.packtpub.com/unlock-benefits/help查看详细的常见问题解答,了解如何查找您的发票等信息。扫描以下二维码即可直接进入帮助页面: If you get stuck and need help, visit https://www.packtpub.com/unlock-benefits/help for a detailed FAQ on how to find your invoices and more. The following QR code will take you to the help page directly: |
|
注意:如果您仍然遇到问题,请联系customercare@packt.com。
Note: If you are still facing issues, reach out to customercare@packt.com.
在组织中部署生成式人工智能系统(GenAISys)并非仅仅通过API集成GPT、Grok、Llama或Gemini等独立模型。虽然这通常是起点,但我们常常误以为这就是终点。随着人工智能在各个领域的应用日益广泛,对人工智能的需求也与日俱增,这就要求我们部署更先进的人工智能系统,而不仅仅是集成预构建的模型。
Implementing a generative AI system (GenAISys) in an organization doesn’t stop at simply integrating a standalone model such as GPT, Grok, Llama, or Gemini via an API. While this is often a starting point, we often mistake it as the finish line. The rising demand for AI, as it expands across all domains, calls for the implementation of advanced AI systems that go beyond simply integrating a prebuilt model.
一个适用于企业的 GenAISys 系统不仅应该在组织内提供 ChatGPT 级别的功能,而且还应该远不止于此。其功能和特性必须包括自然语言理解(NLU)、通过记忆保持聊天会话中对话内容来实现上下文感知,以及诸如自主图像、音频和文档分析与生成等智能体功能。可以将生成式人工智能模型视为一个具有广泛功能的实体,其中包括作为智能体同事的人工智能代理。
A business-ready GenAISys should provide ChatGPT-grade functionality in an organization, but also go well beyond it. Its capabilities and features must include natural language understanding (NLU), contextual awareness through memory retention across dialogues in a chat session, and agentic functions such as autonomous image, audio, and document analysis and generation. Think of a generative AI model as an entity with a wide range of functions, including AI agents as agentic co-workers.
本章首先定义什么是可用于业务的生成式人工智能系统(GenAISys)。然后,我们将重点讨论生成式人工智能模型(例如 GPT-4o)的核心作用,该模型既可以协调任务,也可以执行任务。在此基础上,我们将为上下文感知和记忆保持奠定基础,并进行讨论。生成式人工智能的记忆分为四种类型:无记忆记忆、短期记忆、长期记忆和多会话记忆。我们还将定义一种新的检索增强生成(RAG)方法,该方法为数据检索引入了一个新维度:指令和智能体推理场景。通过将存储在向量存储中的指令添加到 RAG 中,我们可以检索指令并将其添加到提示中,从而将 RAG 提升到一个新的水平。同时,我们将研究生成式人工智能系统 (GenAISys) 的一个关键组成部分:人的角色。我们将了解人工智能系统在其整个生命周期中如何需要人类的专业知识。此外,我们将定义几个级别的实现方案,以便根据业务需求、可用预算和资源调整 GenAISys 的范围和规模。
We will begin the chapter by defining what a business-ready GenAISys is. From there, we’ll focus on the central role of a generative AI model, such as GPT-4o, that can both orchestrate and execute tasks. Building on that, we will lay the groundwork for contextual awareness and memory retention, discussing four types of generative AI memory: memoryless, short-term, long-term, and multiple sessions. We will also define a new approach to retrieval-augmented generation (RAG) that introduces an additional dimension to data retrieval: instruction and agentic reasoning scenarios. Adding instructions stored in a vector store takes RAG to another level by retrieving instructions that we can add to a prompt. In parallel, we will examine a critical component of a GenAISys: human roles. We will see how, throughout its life cycle, an AI system requires human expertise. Additionally, we will define several levels of implementation to adapt the scope and scale of a GenAISys, not only to business requirements but also to available budgets and resources.
最后,我们将阐述如何使用 OpenAI 的 LLM 和多模态 API 实现上下文感知和记忆保持。GenAISys 必须具备强大的记忆保持功能才能正常工作——没有记忆,就没有上下文;没有上下文,就无法实现可持续的生成。本书中,我们将根据具体任务创建用于无记忆、短期记忆、长期记忆和多会话记忆的模块。读完本章,您将掌握一个清晰的概念框架,了解如何构建一个可用于商业应用的 AI 系统,并获得构建 AI 控制器基础架构的实践经验。
Finally, we’ll illustrate how contextual awareness and memory retention can be implemented using OpenAI’s LLM and multimodal API. A GenAISys cannot work without solid memory retention functionality—without memory, there’s no context, and without context, there’s no sustainable generation. Throughout this book, we will create modules for memoryless, short-term, long-term, and multisession types depending on the task at hand. By the end of this chapter, you will have acquired a clear conceptual framework for what makes an AI system business-ready and practical experience in building the first bricks of an AI controller.
简而言之,本章涵盖以下主题:
In a nutshell, this chapter covers the following topics:
我们首先来定义一下什么是可用于业务的 GenAISys。
Let’s begin by defining what a business-ready GenAISys is.
可用于商业用途的 GenAISys 是一个模块化编排器,可将标准 AI 模型与……无缝集成多功能框架提供混合智能。通过将生成式人工智能与代理功能、RAG、机器学习( ML ) 和 Web相结合凭借搜索、非人工智能操作和多会话记忆系统等技术,我们能够为各种复杂任务提供可扩展且自适应的解决方案。以 ChatGPT 为例,人们常常将“ChatGPT”这个名称既指代生成式人工智能模型,也指代应用程序本身。然而,在聊天界面背后,ChatGPT 和 Gemini 等工具实际上是更大型系统(在线副驾驶)的一部分,这些系统由智能人工智能控制器完全集成和管理,旨在提供流畅的用户体验。
A business-ready GenAISys is a modular orchestrator that seamlessly integrates standard AI models with multifunctional frameworks to deliver hybrid intelligence. By combining generative AI with agentic functionality, RAG, machine learning (ML), web search, non-AI operations, and multiple-session memory systems, we are able to deliver scalable and adaptive solutions for diverse and complex tasks. Take ChatGPT, for example; people use the name “ChatGPT” interchangeably for the generative AI model as well as for the application itself. However, behind the chat interface, tools such as ChatGPT and Gemini are part of larger systems—online copilots—that are fully integrated and managed by intelligent AI controllers to provide a smooth user experience.
Tomczak (2024) 的论文使我们从将生成式人工智能模型视为一个整体,转向考虑复杂的 GenAISys 架构。他的论文使用“GenAISys”一词来描述这些更复杂的平台。本书旨在拓展 GenAISys 的视野,使其包含高级人工智能控制器功能以及在商业化生态系统中的人为角色。GenAISys 并不存在万能的架构。然而,在本节中,我们将定义实现 ChatGPT 级别功能所需的主要组件。这些组件包括生成式人工智能模型、记忆保持功能、模块化 RAG 以及多功能性。图 1.1展示了每个组件如何为 GenAISys 框架做出贡献:
It was Tomczak (2024) who took us from thinking of generative AI models as a collective entity to considering complex GenAISys architectures. His paper uses the term “GenAISys” to describe these more complex platforms. Our approach in this book will be to expand the horizon of a GenAISys to include advanced AI controller functionality and human roles in a business-ready ecosystem. There is no single silver-bullet architecture for a GenAISys. However, in this section, we’ll define the main components necessary to attain ChatGPT-level functionality. These include a generative AI model, memory retention functions, modular RAG, and multifunctional capabilities. How each component contributes to the GenAISys framework is illustrated in Figure 1.1:
图 1.1:GenAISys、AI 控制器和人类角色
Figure 1.1: GenAISys, the AI controller, and human roles
现在我们来定义一下构成 GenAISys 的 AI 控制器和人类角色的架构。
Let’s now define the architecture of the AI controllers and human roles that make up a GenAISys.
一个可用于商业用途的 GenAISys 的核心是一个AI 控制器,它可以激活自定义的 ChatGPT 级功能。基于输入上下文。与具有预定任务顺序的传统流程不同,AI 控制器没有固定的运行顺序,而是根据每个输入的具体上下文动态调整任务,例如网络搜索、图像分析和文本生成。这种基于上下文的智能体方法使 AI 控制器能够无缝协调各个组件,确保生成式人工智能模型高效、连贯地运行。
At the heart of a business-ready GenAISys is an AI controller that activates custom ChatGPT-level features based on the context of the input. Unlike traditional pipelines with predetermined task sequences, the AI controller operates without a fixed order, dynamically adapting tasks—such as web search, image analysis, and text generation—based on the specific context of each input. This agentic context-driven approach enables the AI controller to orchestrate various components seamlessly, ensuring effective and coherent performance of the generative AI model.
要利用定制的 ChatGPT 级 AI 控制器取得有效成果,需要付出大量努力。然而,回报是打造一类能够承受现实世界压力并产生切实业务成果的新型 AI 系统。一个强大的 AI 控制器生态系统可以支持多个领域的应用场景:客户支持自动化、销售线索生成、生产优化(服务和制造)、医疗保健响应支持、供应链优化,以及市场可能涉及的任何其他领域!因此,GenAISys 需要一个 AI 控制器来协调多个流程,例如上下文感知以理解提示的意图,以及记忆保持以支持跨会话的连续性。
A lot of work is required to achieve effective results with a custom ChatGPT-grade AI controller. However, the payoff is a new class of AI systems that can withstand real-world pressure and produce tangible business results. A solid AI controller ecosystem can support use cases across multiple domains: customer support automation, sales lead generation, production optimization (services and manufacturing), healthcare response support, supply chain optimization, and any other domain the market will take you! A GenAISys, thus, requires an AI controller to orchestrate multiple pipelines, such as contextual awareness to understand the intent of the prompt and memory retention to support continuity across sessions.
GenAISys 还必须定义人类角色,这些角色决定了可以访问哪些功能和数据。不过,在讨论人类角色之前,我们先来了解一下驱动 AI 控制器的关键组件。如图1.1所示,生成式 AI 模型、内存和模块化组件构成了系统的核心。RAG(红绿灯)和多功能性在实现灵活的、上下文驱动的编排方面都发挥着至关重要的作用。让我们探讨这些要素如何协同工作,构建一个可用于业务的 GenAISys。首先,我们将定义生成式 AI 模型的作用。
The GenAISys must also define human roles, which determine which functions and data can be accessed. Before we move on to human roles, however, let’s first break down the key components that power the AI controller. As shown in Figure 1.1, the generative AI model, memory, modular RAG, and multifunctional capabilities each play vital roles in enabling flexible, context-driven orchestration. Let’s explore how these elements work together to build a business-ready GenAISys. We will first define the role of the generative AI model.
构建可持续的GenAISys系统时,我们需要模型互换性——即模型更换的灵活性。根据需要导出底层模型。生成式人工智能模型应作为系统的一个组件,而不是作为系统本身。系统的核心是围绕这个框架构建。这样一来,如果我们的模型被弃用或需要更新,或者我们找到了性能更好的模型,我们就可以轻松地用更适合我们项目的模型替换它。
When we build a sustainable GenAISys, we need model interchangeability—the flexibility to swap out the underlying model as needed. A generative AI model should serve as a component within the system, not as the core that the system is built around. That way, if our model is deprecated or requires updating, or we simply find a better-performing one, we can simply replace it with another that better fits our project.
因此,生成式人工智能模型可以是 OpenAI 的 GPT、Google 的 Gemini、Meta 的 Llama、xAI 的 Grok,或者任何 Hugging Face 模型,只要它支持所需的任务即可。理想情况下,我们应该选择一个多用途、多模态的模型,该模型应涵盖文本、视觉和推理能力。Bommasani 等人 (2021) 对这类基础模型进行了全面的分析,其范围远超语言学习模型 (LLM)。
As such, the generative AI model can be OpenAI’s GPT, Google’s Gemini, Meta’s Llama, xAI’s Grok, or any Hugging Face model, as long as it supports the required tasks. Ideally, we should choose a multipurpose, multimodal model that encompasses text, vision, and reasoning abilities. Bommasani et al. (2021) provide a comprehensive analysis of such foundation models, whose scope reaches beyond LLMs.
如图 1.2所示,生成式 AI 模型具有两个主要功能:
A generative AI model has two main functions, as shown in Figure 1.2:
图 1.2:用于协调或执行任务的生成式 AI 模型
Figure 1.2: A generative AI model to orchestrate or execute tasks
请注意图 1.2 的一个独特之处。图中没有箭头指示输入、编排和执行组件的方向。与传统的硬编码线性流水线不同,灵活的 GenAISys 的组件是无序的。我们构建组件,然后让编排功能选择的自动化场景动态地对任务进行排序。
Notice that Figure 1.2 has a unique feature. There are no arrows directing the input, orchestration, and execution components. Unlike traditional hardcoded linear pipelines, a flexible GenAISys has its components unordered. We build the components and then let automated scenarios selected by the orchestration function order the tasks dynamically.
这种灵活性确保系统的能够适应各种各样的任务。我们将我们无法构建一个能够解决所有任务的系统,但我们可以构建一个能够满足公司内部各种任务需求的系统。以下两个示例工作流程说明了 GenAISys 如何根据相关角色动态地对任务进行排序:
This flexibility ensures the system’s adaptability to a wide range of tasks. We will not be able to build a system that solves every task, but we can build one that satisfies a wide range of tasks within a company. Here are two example workflows that illustrate how a GenAISys can dynamically sequence tasks based on the roles involved:
可能性无穷无尽;然而,所有场景都将依赖于内存来确保一致且符合上下文的行为。接下来我们来看一下内存。
The possibilities are unlimited; however, all the scenarios will rely on the memory to ensure consistent, context-aware behavior. Let’s look at memory next.
先进的诸如 OpenAI 的 GPT、Meta 的 Llama、xAI 的 Grok、Google 的 Gemini 以及许多 Hugging Face 的变体等生成式 AI 模型,无论其具体版本或性能水平如何,都是上下文驱动的。您需要根据项目需求选择模型,但基本原则依然很简单:
Advanced generative AI models such as OpenAI’s GPT, Meta’s Llama, xAI’s Grok, Google’s Gemini, and many Hugging Face variants are context-driven regardless of their specific version or performance level. You will choose the model based on your project, but the basic rule remains simple:
无上下文 => 无有意义的生成
No-context => No meaningful generation
当我们使用 ChatGPT 或任何其他辅助工具时,无需担心上下文记忆的问题,因为它们会自动处理。我们只需开始对话,然后根据收到的回复水平调整提示语,一切都会顺利进行。然而,当我们开发一个带有生成式 AI API 的系统时,情况就不同了。我们必须从零开始,明确地构建情境感知和记忆保持能力。
When we use ChatGPT or any other copilot, we have nothing to worry about as contextual memory is handled for us. We just start a dialogue, and things run smoothly as we adapt our prompt to the level of responses we are obtaining. However, when we develop a system with a generative AI API from scratch, we have to explicitly build contextual awareness and memory retention.
在众多可通过 API 实现的记忆保持策略中,有四种方法脱颖而出:
Four approaches stand out among the wide range of possible memory retention strategies with an API:
图 1.3总结了这四种内存架构。我们将在下一节“上下文感知和内存保持”中,使用 GPT-4o 在 Python 中演示每种配置。
Figure 1.3 sums up these four memory architectures. We’ll demonstrate each configuration in Python using GPT-4o in the upcoming section, Contextual awareness and memory retention.
图 1.3:四种不同的 GenAISys 内存配置
Figure 1.3: Four different GenAISys memory configurations
这四人内存类型是开发 GenAISys 的起点,可以根据需要进行扩展。然而,实际应用中通常需要额外的功能,包括以下方面:
These four memory types serve as a starting point that can be expanded as necessary when developing a GenAISys. However, practical implementations often require additional functionality, including the following:
生成模型中,记忆的两大主要类别之间存在着本质区别:
There are native distinctions between two key categories of memorization in generative models:
我们可以看到,构建 GenAISys 的记忆系统需要精心设计和周密开发,以实现 ChatGPT 级别的记忆功能以及其他记忆配置,例如长期跨主题会话。然而,这种高级记忆系统的最终目标是增强模型的上下文感知能力。虽然像 GPT-4o 这样的生成式 AI 模型已经内置了上下文感知能力,但为了扩展我们正在构建的 GenAISys 这类上下文驱动系统的功能,我们需要集成高级的 RAG 功能。
We can see that building a GenAISys’s memory requires careful design and deliberate development to implement ChatGPT-grade memory and additional memory configurations, such as long-term, cross-topic sessions. The ultimate goal, however, of this advanced memory system is to enhance the model’s contextual awareness. While generative AI models such as GPT-4o have inbuilt contextual awareness, to expand the scope of a context-driven system such as the GenAISys we’re building, we need to integrate advanced RAG functionality.
在本节中,我们将解释使用 RAG 的动机GenAISys的三大核心功能:
In this section, we explain the motivations for using RAG for three core functions within a GenAISys:
让我们从知识检索开始。
Let’s begin with knowledge retrieval.
生成式人工智能模型在揭示它们所学习到的参数化知识方面表现出色,这方面……这些知识嵌入在模型的权重中。这些知识在训练过程中学习并嵌入到 GPT、Llama、Grok 和 Gemini 等模型中。然而,当不再向模型输入新数据时,这些知识就会在截止日期后停止更新。此时,为了更新或补充这些知识,我们有两种选择:
Generative AI models excel when it comes to revealing parametric knowledge that they have learned, which is embedded in their weights. This knowledge is learned during training and embedded in models such as GPT, Llama, Grok, and Gemini. However, that knowledge stops at the cutoff date when no additional data is fed to the model. At that point, to update or supplement it, we have two options:
需要注意的是,静态的隐性知识如果没有动态的显性知识,就无法有效扩展。更多内容将在后续章节中讨论。
It’s important to note here that static implicit knowledge cannot scale effectively without dynamic explicit knowledge. More on that in the upcoming chapters.
生成式人工智能模型正在扩展上下文窗口的边界。例如,截至撰写本文时,支持的上下文长度如下:
Generative AI models are expanding the boundaries of context windows. For example, at the time of writing, the following are the supported context lengths:
虽然这些大型上下文窗口令人印象深刻,但它们在令牌成本和计算方面会非常昂贵。此外,主要问题在于,当上下文过大时,其精度会降低。而且,我们并不需要最大的上下文窗口,而只需要最适合我们项目的那个。因此,如果需要优化项目,就可以考虑实现 RAG(红绿蓝宝石算法)。
While impressive, these large context windows can be expensive in terms of token costs and compute. Furthermore, the main issue is that their precision diminishes when the context becomes too large. Also, we don’t need the largest context window but only the one that best fits our project. This can justify implementing RAG if necessary to optimize a project.
RAG 的分块过程将大型内容拆分成更细致的词元组。当我们嵌入这些分块时,它们就变成了向量,可以存储并从向量存储库中高效检索。这种方法确保我们只使用每个任务最相关的上下文,从而最大限度地减少令牌使用和最大化响应质量。因此,我们可以依赖生成式人工智能能力来处理参数化的隐式知识,并依赖 RAG 来处理向量存储中大量的显式非参数数据。我们可以进一步扩展 RAG,并将其用作协调器。
The chunking process of RAG splits large content into more nuanced groups of tokens. When we embed these chunks, they become vectors that can be stored and efficiently retrieved from vector stores. This approach ensures we use only the most relevant context per task, minimizing token usage and maximizing response quality. Thus, we can rely on generative AI capabilities for parametric implicit knowledge and RAG for large volumes of explicit non-parametric data in vector stores. We can take RAG further and use the method as an orchestrator.
AI控制器通过生成式AI模型与RAG连接。RAG用于增强……模型的输入包含一系列灵活的指令。乍一看,使用 RAG 来检索指令似乎有悖常理——但仔细想想。如果我们把指令存储为向量,并针对特定任务检索最佳指令集,就能获得一种快速、灵活的方式来实现智能体功能,生成有效的结果,并且避免每次更改指令策略以适应模型行为时都需要对模型进行微调。
The AI controller bridges with RAG through the generative AI model. RAG is used to augment the model’s input with a flexible range of instructions. Now, using RAG to retrieve instructions might seem counterintuitive at first—but think about it. If we store instructions as vectors and retrieve the best set for a task, we get a fast, adaptable way to enable agentic functionality, generate effective results, and avoid the need to fine-tune the model every time we change our instruction strategies for how we want it to behave.
这些指令充当优化的提示,针对当前任务量身定制。从这个意义上讲,RAG 成为 AI 系统编排层的一部分。像 Pinecone 这样的向量存储可以存储和返回这些功能信息,如图1.4所示:
These instructions act as optimized prompts, tailored to the task at hand. In this sense, RAG becomes part of the orchestration layer of the AI system. A vector store such as Pinecone can store and return this functional information, as illustrated in Figure 1.4:
图 1.4:RAG 编排功能
Figure 1.4: RAG orchestration functionality
这些场景的协调安排通过以下步骤实现:
The orchestration of these scenarios is performed through the following:
添加GenAISys 的经典功能和机器学习功能极大地增强了其性能。GenAISys 的模块化架构使这种多功能方法得以有效应用,例如以下用例:
Adding classical functions and ML functionality to the GenAISys enhances its capabilities dramatically. The modular architecture of a GenAISys makes this multifunctional approach effective, as in the following use cases:
RAG 仍然是 GenAISys 的关键组成部分,我们将在第 3 章“将动态 RAG 集成到 GenAISys 中”中将其构建到我们的 GenAISys 中。在第 3 章“将动态 RAG 集成到 GenAISys 中”中,我们还将通过多功能特性增强系统。
RAG remains a critical component of a GenAISys, which we will build into our GenAISys in Chapter 3, Integrating Dynamic RAG into the GenAISys. In Chapter 3, Integrating Dynamic RAG into the GenAISys, we will also enhance the system with multifunctional features.
接下来我们将讨论构成任何 GenAISys 支柱的人员角色。
We’ll now move on to the human roles, which form the backbone of any GenAISys.
与普遍认知相反,GenAISys(例如 ChatGPT 平台)的成功部署和运行在其整个生命周期中都高度依赖于人的参与。虽然这些工具看似能够处理复杂问题。这些任务看似毫不费力地完成,但其背后却有多层的人力专业知识、监督和协调,才使得它们的顺利运行成为可能。
Contrary to popular belief, the successful deployment and operation of a GenAISys—such as the ChatGPT platform—relies heavily on human involvement throughout its entire life cycle. While these tools may seem to handle complex tasks effortlessly, behind the scenes are multiple layers of human expertise, oversight, and coordination that make their smooth operation possible.
软件专业人员首先必须设计架构,处理海量数据集,并在配备尖端计算资源的价值百万美元的服务器上对系统进行微调。部署完成后,需要庞大的团队监控、验证和解读系统输出,并根据错误、新兴技术和监管变化不断调整系统。此外,在组织内部部署这些系统时——无论是在企业内部网、面向公众的网站、研究环境还是学习管理系统中——都需要跨多个领域的跨职能协调。
Software professionals must first design the architecture, process massive datasets, and fine-tune the system on million-dollar servers equipped with cutting-edge compute resources. After deployment, large teams are required to monitor, validate, and interpret system outputs—continuously adapting them in response to errors, emerging technologies, and regulatory changes. On top of that, when it comes to deploying these systems within organizations—whether inside corporate intranets, public-facing websites, research environments, or learning management systems—it takes cross-functional coordination efforts across multiple domains.
这些任务需要高水平的专业知识和合格的团队。因此,人类不仅不可替代,而且至关重要!他们是人工智能系统的架构师、监督者、管理者和守护者,负责创建和维护这些系统。
These tasks require high levels of expertise and qualified teams. Humans are, therefore, not just irreplaceable; they are critical! They are architects, supervisors, curators, and guardians of the AI systems they create and maintain.
实施 GenAISys 需要技术技能和团队合作才能获得最终用户的支持。AI控制器设计、用户角色和期望之间的协作挑战。对于任何认为部署现实世界的AI会面临挑战的人来说,这都是一个值得探讨的问题。系统并非仅仅是获取模型(例如最新的 GPT、Llama 或 Gemini)那么简单,仔细审视所需的资源才能揭示真正的挑战。人工智能系统的开发、部署和维护可能需要大量的人力资源。当然,并非每个组织都需要所有这些角色,但我们必须认识到其中涉及的各种技能,例如以下这些:
Implementing a GenAISys requires technical skills and teamwork to gain the support of end users. It’s a collaborative challenge between AI controller design, user roles, and expectations. To anyone who thinks that deploying a real-world AI system is just about getting access to a model—such as the latest GPT, Llama, or Gemini—a close look at the resources required will reveal the true challenges. A massive number of human resources might be involved in the development, deployment, and maintenance of an AI system. Of course, not every organization will need all of these roles, but we must recognize the range of skills involved, such as the following:
这些只是示例——足以说明构建和运行一个完整的GenAISys系统涉及多少不同的角色。图1.5显示,GenAISys系统的设计和实施是一个持续的过程,每个阶段都需要人力资源。
These are just examples—just enough to show how many different roles are involved in building and operating a full-scale GenAISys. Figure 1.5 shows that designing and implementing a GenAISys is a continual process, where human resources are needed at every stage.
图 1.5:GenAISys 生命周期
Figure 1.5: A GenAISys life cycle
我们可以请注意,GenAISys 的生命周期是一个永无止境的过程:
We can see that a GenAISys life cycle is a never-ending process:
现实世界人工智能高度依赖人类的能力——那种人工智能自身无法复制的对上下文和技术的理解。人工智能可以有效地自动化各种任务,但只有人类才能提供深刻的洞察力,使这些系统与真正的业务目标保持一致。
Real-world AI relies heavily on human abilities—the kind of contextual and technical understanding that AI alone cannot replicate. AI can automate a wide range of tasks effectively. But it’s humans who bring the deep insight needed to align those systems with real business goals.
让我们更进一步,通过 RACI 热图来展示为什么人类是 GenAISys 的关键组成部分。
Let’s take this further and look at a RACI heatmap to show why humans are a critical component of a GenAISys.
组织一个 GenAISys 项目所需的人力资源远远超出了人工智能自动化所能提供的范畴。仅靠自身力量就能提供。RACI是一种责任分配矩阵。这有助于明确每项任务的角色和职责。或者,通过明确谁是负责人、谁承担责任、谁被咨询、谁被告知,即可做出决策。RACI 模型非常适合管理构建 GenAISys 的复杂性。它为系统生命周期中不断增长的人员角色列表添加了结构,并为协调他们的参与提供了一个务实的框架。
Organizing a GenAISys project requires human resources that go far beyond what AI automation alone can provide. RACI is a responsibility assignment matrix that helps define roles and responsibilities for each task or decision by identifying who is Responsible, Accountable, Consulted, and Informed. RACI is ideal for managing the complexity of building a GenAISys. It adds structure to the growing list of human roles required during the system’s life cycle and provides a pragmatic framework for coordinating their involvement.
如同任何复杂的项目一样,参与 GenAISys 项目的团队需要跨学科协作,而 RACI 模型有助于明确每个人的职责。RACI 中的每个字母都代表一种特定的角色:
As in any complex project, teams working on a GenAISys need to collaborate across disciplines, and RACI helps define who does what. Each letter in RACI stands for a specific type of role:
种族热力图通常包含项目中每个人员角色的图例。让我们创建一个包含以下角色的热力图:
A RACI heatmap typically contains legends for each human role in a project. Let’s build a heatmap with the following roles:
并非每个 GenAISys 项目都会包含所有这些角色,但这取决于项目的范围和规模。这个项目中,很多环节都至关重要。现在,让我们来列举一下。在典型的生成式人工智能项目中,上述角色的主要职责如下:
Not every GenAISys project will include all of these roles, but depending on the scope and scale of the project, many of them will be critical. Now, let’s list the key responsibilities of the roles defined above in a typical generative AI project:
我们已经定义了角色和任务。现在,我们可以展示如何将它们映射到实际场景中。图 1.6展示了一个 GenAISys 的 RACI 热力图示例。
We’ve defined the roles and the tasks. Now, we can show how they can be mapped to a real-world scenario. Figure 1.6 illustrates an example RACI heatmap for a GenAISys.
图 1.6:RACI 热图示例
Figure 1.6: Example of a RACI heatmap
For example, in this heatmap, the MLE has the following responsibilities:
我们可以用一条简单的规则来概括 GenAISys 的运作方式:
We can sum it up with one simple rule for a GenAISys:
没有人,就没有系统!
No humans -> no system!
我们可以看到,在 GenAISys 的整个生命周期中,从设计到维护和支持,包括根据用户反馈不断进行迭代更新,我们都是必不可少的。人类一直存在,并将长期存在!接下来,让我们探讨 GenAISys 可以带来的商业机遇。
We can see that we are necessary during the whole life cycle of a GenAISys, from design to maintenance and support, including continual evolutions to keep up with user feedback. Humans have been and will be here for a long time! Next, let’s explore the business opportunities that a GenAISys can unlock.
我们往往无法获得 OpenAI、Meta、xAI 或 Microsoft Azure 等公司价值数十亿美元的庞大资源来构建类似 ChatGPT 的平台。上一节已经表明了这一点。在类似 ChatGPT 那样看似简单流畅的界面之下,隐藏着复杂的底层架构,包括昂贵的基础设施、稀缺人才以及持续的改进和演进,这些都需要耗费大量资源,只有大型企业才能负担得起。因此,更明智的做法是从一开始就确定项目类别,并利用现有模块和库的强大功能来构建我们的 GenAISys。无论应用场景如何,例如市场营销、财务、生产或支持,我们都需要找到合适的范围和规模,才能实现一个切实可行的 GenAISys。
More often than not, we will not have access to the incredible billion-dollar resources of OpenAI, Meta, xAI, or Microsoft Azure to build ChatGPT-like platforms. The previous section showed that beneath a ChatGPT-like, seemingly simple, seamless interface, there is a complex layer of expensive infrastructure, rare talent, and continuous improvement and evolution that absorb resources only large corporations can afford. Therefore, a smarter path from the start is to determine which project category we are in and leverage the power of existing modules and libraries to build our GenAISys. Whatever the use case, such as marketing, finance, production, or support, we need to find the right scope and scale to implement a realistic GenAISys.
如前所述,任何 GenAISys 的第一步都是定义项目的目标(机会),包括其范围和规模。在此步骤中,您将评估风险,例如成本、保密性和资源可用性(风险管理)。
The first step of any GenAISys is to define the project’s goal (opportunity), including its scope and scale, as we mentioned. During this step, you will assess the risks, such as costs, confidentiality, and resource availability (risk management).
我们可以根据我们的资源、目标、用例的复杂性和预算,将 GenAISys 项目分为三种主要业务实施类型。如图 1.7所示:
We can classify GenAISys projects into three main business implementation types depending on our resources, our objectives, the complexity of our use case, and our budget. These are illustrated in Figure 1.7:
图 1.7:GenAISys 的三大主要业务应用
Figure 1.7: The three main GenAISys business implementations
让我们从混合方法开始,这是一种在不过度建设的情况下实现业务成果的实用方法。
Let’s begin with a hybrid approach, a practical way to deliver business results without overbuilding.
混合动力框架使您能够通过将现成的 SaaS 平台与仅在必要时开发的定制组件(例如网络搜索和数据清洗)相结合,最大限度地降低开发成本和时间。这样,您无需从零开始开发所有内容,即可利用生成式 AI 的强大功能。接下来,我们将介绍其关键特性和一些用例示例。
A hybrid framework enables you to minimize development costs and time by combining ready-to-use SaaS platforms with custom-built components developed only when necessary, such as web search and data cleansing. This way, you can leverage the power of generative AI without developing everything from scratch. Let’s go through the key characteristics and a few example use cases.
这一类别能够以更少的资源(包括成本和开发工作量)实现更多功能。混合系统可以是独立的 GenAISys,也可以是大型生成式人工智能平台中的一个子系统,无需进行全面开发。接下来,我们将探讨小规模、小范围的 GenAISys 如何帮助我们走得更远。
This category offers the ability to do more with less—in terms of both cost and development effort. A hybrid system can be a standalone GenAISys or a subsystem within a larger generative AI platform where full-scale development isn’t necessary. Let’s now look at how a small-scope, small-scale GenAISys can take us even further.
小型 GenAI 系统可能包含一个智能的、由 GenAI 驱动的 AI 控制器,该控制器连接到一个向量商店。此设置允许该系统无需完整的基础设施即可检索数据、触发指令并调用其他功能,例如网络搜索或机器学习。
A small-scale GenAISys might include an intelligent, GenAI-driven AI controller connected to a vector store. This setup allows the system to retrieve data, trigger instructions, and call additional functionality such as web search or ML—without needing full-scale infrastructure.
这一类别让我们在混合方法的基础上更进一步,同时仍然保持现实性和可管理性,适用于中小型企业,甚至大型组织内的各个部门。
This category takes us a step beyond the hybrid approach, while still staying realistic and manageable for small to mid-sized businesses or even individual departments within large organizations.
如果你是在专家团队中工作对于拥有充足预算和先进基础设施的组织而言,此类别正适合您。您的团队可以构建一个功能齐全的 GenAISys 系统,其功能已接近 ChatGPT 级别的平台。
If you’re working in a team of experts within an organization that has a large budget and advanced infrastructure, this category is for you. Your team can build a full-scale GenAISys that begins to approach the capabilities of ChatGPT-grade platforms.
可以将我们在本书中构建的 GenAISys 视为一个 alpha 版本——一个可以根据需要在组织内的任何位置进行克隆、配置和部署的模板。
Think of the GenAISys we’re building in this book as an alpha version—a template that can be cloned, configured, and deployed anywhere in the organization as often as needed.
您可以加入这些拥有资源的大型组织之一,以构建可持续的 GenAISys,无论是在云平台、本地服务器还是两者兼而有之。
You can join one of these large organizations that have the resources to build a sustainable GenAISys, whether it be on a cloud platform, local servers, or both.
根据组织的具体目标、预算和技术能力,GenAISys 可分为混合型、小型和全规模三种类型,为构建此类系统提供了不同的路径。本书将探讨构成 GenAISys 的关键组件。读完本书后,您将能够为任何一种类型的系统做出贡献,并为您的项目提供切实可行且技术上合理的建议。
The three categories—hybrid, small scale, and full scale—offer distinct paths for building a GenAISys, depending on your organization’s goals, budget, and technical capabilities. In this book, we’ll explore the critical components that make up a GenAISys. By the end, you’ll be equipped to contribute to any of these categories and offer realistic, technically grounded recommendations for the projects you work on.
现在让我们揭开面纱,开始在代码中构建上下文感知和记忆保持能力。
Let’s now lift the hood and begin building contextual awareness and memory retention in code.
在本节中,我们将开始实现情境感知和记忆保持的模拟。本文使用 Python 来阐述“构建 GenAISys 的内存”一节中介绍的概念。目标是展示管理上下文和内存的实用方法——随着生成式 AI 平台的不断发展,这两个特性正变得越来越重要。
In this section, we’ll begin implementing simulations of contextual awareness and memory retention in Python to illustrate the concepts introduced in the Building the memory of a GenAISys section. The goal is to demonstrate practical ways to manage context and memory—two features that are becoming increasingly critical as generative AI platforms evolve.
打开Contextual_Awareness_and_Memory_Retention.ipynb位于chapter01GitHub 仓库文件夹(https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main)中的文件。你会看到该笔记本分为五个主要部分:
Open the Contextual_Awareness_and_Memory_Retention.ipynb file located in the chapter01 folder of the GitHub repository (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main). You’ll see that the notebook is divided into five main sections:
commons代码仓库目录中,以便我们在本书中需要时可以重用它们。commons directory of the repository, so we can reuse them when necessary throughout the book目标是通过明确的流程来阐释每种类型的记忆。目前这些示例都特意采用手动方式,但下一章中我们将开始构建的AI控制器将实现自动化并管理这些流程。
The goal is to illustrate each type of memory in an explicit process. These examples are intentionally kept manual for now, but they will be automated and managed by the AI controller we will begin to build in the next chapter.
由于生成模型的概率特性,对于相同的提示,您可能会在多次运行中观察到不同的输出。请确保在单个会话中运行整个笔记本,因为此笔记本中的内存保留在不同的单元格中是显式的。在第 2 章中,此功能将变为持久化,并完全由 AI 控制器管理。
Due to the probabilistic nature of generative models, you may observe different outputs for the same prompt across runs. Make sure to run the entire notebook in a single session, as memory retention in this notebook is explicit in different cells. In Chapter 2, this functionality will become persistent and fully managed by the AI controller
第一步是安装环境。
The first step is to install the environment.
我们将我们需要commons为 GenAISys 项目创建一个目录。该目录将包含本书 GitHub 代码库中所有笔记本所需的主要模块和库。这样做的目的是为了专注于系统维护和支持的设计。因此,通过将主要模块和库集中在一个目录中,我们可以专注于需要关注的资源,而无需在每个笔记本中重复设置步骤。此外,本节还将作为本书 GitHub 代码库中所有笔记本的参考点。我们只会描述每个资源的下载过程一次,然后在本书中重复使用它们来构建我们的教学用 GenAISys。
We will need a commons directory for our GenAISys project. This directory will contain the main modules and libraries needed across all notebooks in this book’s GitHub repository. The motivation is to focus on designing the system for maintenance and support. As such, by grouping the main modules and libraries in one directory, we can zero in on a resource that requires our attention instead of repeating the setup steps in every notebook. Furthermore, this section will serve as a reference point for all the notebooks in this book’s GitHub repository. We’ll only describe the downloading of each resource once and then reuse them throughout the book to build our educational GenAISys.
因此,我们可以从该目录下载 notebook 资源commons并安装所需的依赖项。
Thus, we can download the notebook resources from the commons directory and install the requirements.
第一个下一步是下载grequests.py一个实用脚本,我们将在本书中一直使用它。它包含一个函数,可以直接从 GitHub 下载我们需要的文件:
The first step is to download grequests.py, a utility script we will use throughout the book. It contains a function to download the files we need directly from GitHub:
!curl -L https://raw.githubusercontent.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/master/commons/grequests.py --output grequests.py
小贴士:使用AI 代码解释器和快速复制功能,提升您的编码体验。在新一代 Packt 阅读器中打开本书。点击“复制”按钮。
Quick tip: Enhance your coding experience with the AI Code Explainer and Quick Copy features. Open this book in the next-gen Packt Reader. Click the Copy button
(1)快速将代码复制到您的编码环境中,或单击“解释”按钮
(1) to quickly copy code into your coding environment, or click the Explain button
(2)让 AI 助手向您解释一段代码。
(2) to get the AI assistant to explain a block of code to you.
购买本书即可免费获赠新一代 Packt Reader 阅读器。扫描二维码或访问packtpub.com/unlock,然后使用搜索栏按书名查找本书。请仔细核对显示的版本,确保您购买的是正确的版本。
The next-gen Packt Reader is included for free with the purchase of this book. Scan the QR code OR visit packtpub.com/unlock, then use the search bar to find this book by name. Double-check the edition shown to make sure you get the right one.
此脚本的目标是通过调用download以下函数从存储库的任意目录下载文件grequests:
The goal of this script is to download a file from any directory of the repository by calling the download function from grequests:
import sys
import subprocess
from grequests import download
download([directory],[file])
此函数使用curl命令从指定的目录和文件名下载文件。它还包含基本的错误处理机制,以应对命令执行失败的情况。
This function uses a curl command to download files from a specified directory and filename. It also includes basic error handling in case of command execution failures.
代码首先导入subprocess用于处理路径和命令的模块。该download函数包含两个参数:
The code begins by importing subprocess to handle paths and commands. The download function contains two parameters:
def download(directory, filename):
directory文件所在的 GitHub 仓库子目录directory: The subdirectory of the GitHub repository where the file is storedfilename要下载的文件名filename: The name of the file to download然后定义 GitHub 仓库的基本 URL,指向我们需要的原始文件:
The base URL for the GitHub repository is then defined, pointing to the raw files we will need:
base_url = 'https://raw.githubusercontent.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/main/'
directory现在我们需要使用`and`参数定义文件的完整 URL filename:
We now need to define the file’s full URL with the directory and filename parameters:
file_url = f"{base_url}{directory}/{filename}"
该函数现在定义了该curl命令:
The function now defines the curl command:
curl_command = f'curl -o {filename} {file_url}'
最后,download命令被执行:
Finally, the download command is executed:
subprocess.run(curl_command, check=True, shell=True)
check=Truecurl如果命令执行失败,则触发异常。check=True activates an exception if the curl command failsshell=True通过 shell 运行命令。shell=True runs the command through the shellThe try-except block is used to handle errors:
try:
# Prepare the curl command with the Authorization header
curl_command = f'curl -o {filename} {file_url}'
# Execute the curl command
subprocess.run(curl_command, check=True, shell=True)
print(f"Downloaded '{filename}' successfully.")
except subprocess.CalledProcessError:
print(f"Failed to download '{filename}'. Check the URL and your internet connection")
我们现在有了一个独立的下载脚本,本书将全程使用它。接下来,让我们下载本程序所需的资源。
We now have a standalone download script that we’ll use throughout the book. Let’s go ahead and download the resources we need for this program.
We need three resources for this notebook:
requirements01.py安装我们想要的精确 OpenAI 版本requirements01.py to install the precise OpenAI version we wantopenai_setup.py初始化 OpenAI API 密钥openai_setup.py to initialize the OpenAI API keyopenai_api_py包含一个可重用的调用 GPT-4o 模型的函数,因此您无需在多个单元格或笔记本中重复编写相同的代码。openai_api_py contains a reusable function for calling the GPT-4o model, so you don’t need to rewrite the same code across multiple cells or notebooks本书中,我们将重复使用相同的函数来调用标准的 OpenAI API。您可以随时返回本节回顾安装过程。commons如有需要,我们将在目录中添加其他场景。
We will be reusing the same functions throughout the book for standard OpenAI API calls. You can come back to this section any time you want to revisit the installation process. Other scenarios will be added to the commons directory when necessary.
我们可以使用以下函数下载这些文件download():
We can download these files with the download() function:
from grequests import download
download("commons","requirements01.py")
download("commons","openai_setup.py")
download("commons","openai_api.py")
The first resource is requirements01.py.
requirements01.py确保安装特定版本的 OpenAI 库,以避免冲突。与其他已安装的库一起使用。因此,代码会卸载现有版本,强制安装请求的指定版本,并验证结果。该函数执行安装时会进行错误处理:
requirements01.py makes sure that a specific version of the OpenAI library is installed to avoid conflicts with other installed libraries. The code thus uninstalls existing versions, force-installs the specified version requested, and verifies the result. The function executes the installation with error handling:
def run_command(command):
try:
subprocess.check_call(command)
except subprocess.CalledProcessError as e:
print(f"Command failed: {' '.join(command)}\nError: {e}")
sys.exit(1)
该功能的第一步是卸载当前已安装的 OpenAI 库:
The first step for the function is to uninstall the current OpenAI library, if there is one:
print("Installing 'openai' version 1.57.1...")
run_command([sys.executable, "-m", "pip", "install", "--force-reinstall", "openai==1.57.1"])
该函数随后会安装特定版本的 OpenAI:
The function then installs a specific version of OpenAI:
run_command(
[
sys.executable, "-m", "pip", "install",
"--force-reinstall", "openai==1.57.1"
]
)
最后,该函数会验证 OpenAI 是否已正确安装:
Finally, the function verifies that OpenAI is properly installed:
try:
import openai
print(f"'openai' version {openai.__version__} is installed.")
except ImportError:
print("Failed to import the 'openai' library after installation.")
sys.exit(1)
The output at the end of the function should be as follows:
'openai' version 1.57.1 is installed.
现在我们可以初始化 OpenAI API 密钥了。
We can now initialize the OpenAI API key.
有初始化 OpenAI 的两种方法笔记本中的 API 密钥:
There are two methods to initialize the OpenAI API key in the notebook:
图 1.8:添加新的 Google 密钥
Figure 1.8: Add a new Google secret key
openai_setup然后,我们可以在我们的函数中调用 Google 的函数来初始化密钥openai_setup.py:
Then, we can use Google’s function to initialize the key by calling it in our openai_setup function in openai_setup.py:
# Import libraries
import openai
import os
from google.colab import userdata
# Function to initialize the OpenAI API key
def initialize_openai_api():
# Function to initialize the OpenAI API key
def initialize_openai_api():
# Access the secret by its name
API_KEY = userdata.get('API_KEY')
if not API_KEY:
raise ValueError("API_KEY is not set in userdata!")
# Set the API key in the environment and OpenAI
os.environ['OPENAI_API_KEY'] = API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")
print("OpenAI API key initialized successfully.")
google_secrets如果设置为True:则此方法被激活
This method is activated if google_secrets is set to True:
google_secrets=True
if google_secrets==True:
import openai_setup
openai_setup.initialize_openai_api()
google_secrets为 `true` False,取消注释以下代码,然后直接输入您的 API 密钥,或使用您选择的任何方法:
if google_secrets==False: # Uncomment the code and choose any method you wish to initialize the API_KEY
import os
#API_KEY=[YOUR API_KEY]
#os.environ['OPENAI_API_KEY'] = API_KEY
#openai.api_key = os.getenv("OPENAI_API_KEY")
#print("OpenAI API key initialized successfully.")
google_secrets to False, uncommenting the following code, and entering your API key directly, or any method of your choice:
if google_secrets==False: # Uncomment the code and choose any method you wish to initialize the API_KEY
import os
#API_KEY=[YOUR API_KEY]
#os.environ['OPENAI_API_KEY'] = API_KEY
#openai.api_key = os.getenv("OPENAI_API_KEY")
#print("OpenAI API key initialized successfully.")
这两种情况下,代码都会创建一个环境变量:
In both cases, the code will create an environment variable:
os.environ['OPENAI_API_KEY'] = API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")
OpenAI API 密钥已初始化。现在我们将导入一个自定义的 OpenAI API 调用。
The OpenAI API key is initialized. We will now import a custom OpenAI API call.
目标接下来是在openai_api.py其中创建一个 OpenAI API 调用函数。我们可以用两行代码导入:
The goal next is to create an OpenAI API call function in openai_api.py that we can import in two lines:
#Import the function from the custom OpenAI API file
import openai_api
from openai_api import make_openai_api_call
因此,该函数被设计成在调用时接收四个变量并无缝地显示它们:
The function is thus built to receive four variables when making the call and display them seamlessly:
# API function call
response = openai_api.make_openai_api_call(
uinput,mrole,mcontent,user_role)
print(response)
该函数的参数如下:
The parameters in this function are the following:
input包含输入(用户或系统),例如:Where is Hawaii?input: Contains the input (user or system), for example, Where is Hawaii?mrole定义系统的角色,例如,You are a geology expert.或者简单地System.mrole: Defines the system’s role, for example, You are a geology expert. or simply System.mcontent例如,我们期望系统是什么样的?You are a geology expert.mcontent: Is what we expect the system to be, for example, You are a geology expert.user_role定义用户角色,例如:useruser_role: Defines the role of the user, for example, user第一个函数中的部分代码定义了我们将在此过程中使用的模型。notebook 会创建一个消息对象,用于 API 调用,并传入我们发送的参数:
The first part of the code in the function defines the model we will be using in this notebook and creates a message object for the API call with the parameters we sent:
def make_openai_api_call(input, mrole,mcontent,user_role):
# Define parameters
gmodel = "gpt-4o"
# Create the messages object
messages_obj = [
{
"role": mrole,
"content": mcontent
},
{
"role": user_role,
"content": input
}
]
然后,我们在这个笔记本的字典中定义 API 调用参数:
We then define the API call parameters in a dictionary for this notebook:
# Define all parameters in a dictionary named params:
params = {
"temperature": 0,
"max_tokens": 256,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0
}
字典参数如下:
The dictionary parameters are the following:
temperature控制响应的随机性。0数值越低,响应越具有确定性。数值越高 (例如,0.7),响应越具有创造性。temperature: Controls the randomness of a response. 0 will produce deterministic responses. Higher values (e.g., 0.7) will produce more creative responses.max_tokens限制响应中令牌的最大数量。max_tokens: Limits the maximum number of tokens of a response.top_p:产生核心采样。它通过以累积概率为 1 从前几个标记中进行采样来控制响应的多样性。top_p: Produces nucleus sampling. It controls the diversity of a response by sampling from the top tokens with a cumulative probability of 1.frequency_penalty:减少标记重复以避免冗余。0将不施加任何惩罚,以及2施加强惩罚。在这种情况下,0由于 OpenAI 模型的高性能,已足够。frequency_penalty: Reduces the repetition of tokens to avoid redundancies. 0 will apply no penalty, and 2 a strong penalty. In this case, 0 is sufficient because of the high performance of the OpenAI model.presence_penalty:通过惩罚现有内容来鼓励新内容的产生,从而避免冗余。它适用于与频率惩罚相同的值。在这种情况下,由于 OpenAI 模型的高性能,因此不需要此控制。presence_penalty: Encourages new content by penalizing existing content to avoid redundancies. It applies to the same values as for the frequency penalty. In this case, due to the high performance of the OpenAI model, it doesn’t require this control.然后我们初始化 OpenAI 客户端,为 API 调用创建实例:
We then initialize the OpenAI client to create an instance for the API calls:
client = OpenAI()
最后,我们制作通过发送模型、消息对象和解压后的参数来调用 API:
Finally, we make the API call by sending the model, the message object, and the unpacked parameters:
# Make the API call
response = client.chat.completions.create(
model=gmodel,
messages=messages_obj,
**params # Unpack the parameters dictionary
)
该函数最后返回我们需要的 API 响应内容:
The function ends by returning the content of the API’s response that we need:
#Return the response
return response.choices[0].message.content
此功能将帮助我们专注于 GenAISys 架构,而无需在笔记本中添加重复的库和函数。
This function will help us focus on the GenAISys architecture without having to overload the notebook with repetitive libraries and functions.
笔记本里有以下内容:
In the notebook, we have the following:
messages_obj包含对话历史记录messages_obj contains the conversation historyparams字典中定义。params dictionary一个 GenAISys 系统包含多个组件,其中包括一个生成模型。您可以选择适合您项目的组件。本书中,这些模型仅用于教学目的,并不构成任何形式的认可或推荐。
A GenAISys will contain many components—including a generative model. You can choose the one that fits your project. In this book, the models are used for educational purposes only, not as endorsements or recommendations.
现在让我们构建并运行一个无状态、无内存的会话。
Let’s now build and run a stateless and memoryless session.
如果我们只想与他人进行一次临时的单次交换,那么无状态、无记忆的会话就很有用。请求之间不存储任何信息。本节中的示例既是无状态的,也是无内存的:
A stateless and memoryless session is useful if we only want a single and temporary exchange with no stored information between requests. The examples in this section are both stateless and memoryless:
让我们从一个语义查询开始。
Let’s begin with a semantic query.
This request expects a purely semantic, factual response:
uinput = "Hawai is on a geological volcano system. Explain:"
mrole = "system"
mcontent = "You are an expert in geology."
user_role = "user"
现在,我们调用 OpenAI API 函数:
Now, we call the OpenAI API function:
# Function call
response = openai_api.make_openai_api_call(
uinput,mrole,mcontent,user_role)
print(response)
如您所见,该回复纯粹是语义上的:
As you can see, the response is purely semantic:
Hawaii is located on a volcanic hotspot in the central Pacific Ocean, which is responsible for the formation of the Hawaiian Islands. This hotspot is a region where magma from deep within the Earth's mantle rises to the surface, creating volcanic activity…
下一个查询是按剧情进行的。
The next query is episodic.
本例中的查询是情景式的,并且基于个人经验。然而,它具有语义意义。因为对夏威夷的描述而带有弦外之音。以下是其传递的信息,颇具诗意:
The query in this example is episodic and draws on personal experience. However, there is a semantic undertone because of the description of Hawaii. Here’s the message, which is rather poetic:
# API message
uinput = "I vividly remember my family's move to Hawaii in the 1970s, how they embraced the warmth of its gentle breezes, the joy of finding a steady job, and the serene beauty that surrounded them. Sum this up in one nice sentence from a personal perspective:"
mrole = "system"
mcontent = "You are an expert in geology."
user_role = "user"
mcontent被重复使用从语义查询示例(“您是地质学专家”)可以看出,系统提示对响应的影响并不显著。由于用户输入高度个人化且带有叙述性,系统提示的作用微乎其微。
mcontent is reused from the semantic query example (“You are an expert in geology”), but in this case, it doesn’t significantly influence the response. Since the user input is highly personal and narrative-driven, the system prompt plays a minimal role.
如有必要,我们可以在函数调用之前插入外部信息。例如,我们可以添加一些来自其他来源的信息,例如当天收到的来自家人的短信:
We could insert external information before the function call if necessary. For example, we could add some information from another source, such as a text message received that day from a family member:
text_message='I agree, we had a wonderful time there.'
uninput=text_message+uinput
text_message="Hi, I agree, we had a wonderful time there."
现在,我们调用该函数:
Now, we call the function:
# Call the function
response = openai_api.make_openai_api_call(
uinput,mrole,mcontent,user_role)
print(response)
我们发现,该回答大多是情景式的,只包含一些语义信息:
We see that the response is mostly episodic with some semantic information:
Moving to Hawaii in the 1970s was a transformative experience for my family, as they found joy in the island's gentle breezes, the security of steady employment, and the serene beauty that enveloped their new home.
我们添加了之前没有记忆保留功能,导致对话框处于无状态。我们来检查一下:
We added no memory retention functionality earlier, making the dialogue stateless. Let’s check:
# API message
uinput = "What question did I just ask you?"
mrole = "system"
mcontent = "You already have this information"
user_role = "user"
When we call the function, our dialogue will be forgotten:
# API function call
response = openai_api.make_openai_api_call(
uinput,mrole,mcontent,user_role
)
print(response)
输出结果证实该会话是无记忆的:
The output confirms that the session is memoryless:
I'm sorry, but I can't recall previous interactions or questions. Could you please repeat your question?
由于 OpenAI API 不会在请求之间保留内存,因此 API 调用是无状态的。如果我们直接使用 ChatGPT,则会话中的对话内容会被记忆。这会对实现产生关键影响。这意味着我们必须构建自己的内存机制,才能使 GenAISys 具备状态行为。让我们从第一层开始:短期记忆。
The API call is stateless because the OpenAI API does not retain memory between requests. If we were using ChatGPT directly, the exchanges would be memorized within that session. This has a critical impact on implementation. It means we have to build our own memory mechanisms to give GenAISys stateful behavior. Let’s start with the first layer: short-term memory.
The goal of this section is to emulate a short-term memory session using a two-step process:
用户输入 => 生成模型 API 调用 => 响应
User input => Generative model API call => Response
为了实现第一步,我们将会话运行到收到响应为止:
To achieve this first step, we run the session up to the response:
uinput = "Hawai is on a geological volcano system. Explain:"
mrole = "system"
mcontent = "You are an expert in geology."
user_role = "user"
response = openai_api.make_openai_api_call(
uinput,mrole,mcontent,user_role)
print(response)
响应的输出存储在response:
The response’s output is stored in response:
"Hawaii is part of a volcanic system known as a hotspot, which is a region of the Earth's mantle where heat rises as a thermal plume from deep within the Earth. This hotspot is responsible for the formation of the Hawaiian Islands. Here's how the process works:…"
The current dialog session is:Hawai is on a geological volcano system. Explain:Sum up your previous response in a short sentence in a maximum of 20 words.The current dialog session is:Hawai is on a geological volcano system. Explain:Sum up your previous response in a short sentence in a maximum of 20 words.此处的目的是压缩会话日志。我们并非总是需要压缩对话,但在较长的会话中,大量的上下文窗口会迅速累积。这项技术有助于降低令牌数量,这对于成本和性能都至关重要。在本例中,我们仅管理一个响应,因此如果需要,我们可以将整个交互过程保存在内存中。尽管如此,这个例子仍然引入了一个便于扩展的实用技巧。
The goal here is to compress the session log. We won’t always need to compress dialogues, but in longer sessions, large context windows can pile up quickly. This technique helps keep the token count low, which matters for both cost and performance. In this particular case, we’re only managing one response, so we could keep the entire interaction in memory if we wanted to. Still, this example introduces a useful habit for scaling up.
提示信息组装完毕后:
Once the prompt is assembled:
该场景在代码中进行了说明:
The scenario is illustrated in the code:
ninput = "Sum up your previous response in a short sentence in a maximum of 20 words."
uinput = (
"The current dialog session is: " +
uinput +
response +
ninput
)
response = openai_api.make_openai_api_call(
uinput, mrole, mcontent, user_role
)
print("New response:", "\n\n", uinput, "\n", response)
输出结果提供了一个简洁明了的对话摘要:
The output provides a nice, short summary of the dialogue:
New response: Hawaii's islands form from volcanic activity over a stationary hotspot beneath the moving Pacific Plate.
这功能虽然这里并非绝对必要,但这为我们之后在书中遇到的更长的对话做了铺垫。接下来,我们来构建一个包含多个会话的长期模拟。
This functionality wasn’t strictly necessary here, but it sets us up for the longer dialogues we’ll encounter later in the book. Next, let’s build a long-term simulation of multiple sessions.
请注意:由于会话目前仅存在于内存中,如果笔记本电脑断开连接,对话内容将会丢失。任何内容尚未存储在磁盘或数据库中。
Keep in mind: Since the session is still in-memory only, the conversation would be lost if the notebook disconnects. Nothing is stored on disk or in a database yet.
在本节中,我们将通过延续之前某个对话来模拟长期记忆。不同之处在于,我们不仅仅是记住单次会议中的对话——我们是在重复利用过去会议的内容来扩展会议内容。对话。此时,“会话”一词的含义更加广泛。在传统的副驾驶场景中,一个用户在一个独立的会话中与一个模型进行交互。而在这里,我们融合了多个会话,并支持多个子会话。多个用户可以在共享环境中与模型交互,从而有效地创建了一个具有分支内存线程的全局会话。您可以将模型想象成正在进行的 Zoom 或 Teams 会议中的一位嘉宾。您可以邀请这位 AI 嘉宾参与讨论,也可以让他保持沉默——当它加入会议时,可能需要一些简要说明。
In this section, we’re simulating long-term memory by continuing a conversation from an earlier session. The difference here is that we’re not just remembering a dialogue from a single session—we’re reusing content from a past session to extend the conversation. At this point, the term “session” takes on a broader meaning. In a traditional copilot scenario, one user interacts with one model in one self-contained session. Here, we’re blending sessions and supporting multiple sub-sessions. Multiple users can interact with the model in a shared environment, effectively creating a single global session with branching memory threads. Think of the model as a guest in an ongoing Zoom or Teams meeting. You can ask the AI guest to participate or stay quiet—and when it joins, it may need a recap.
为了避免重复上次对话的开头部分,我们将重用刚刚进行的短期记忆训练的内容。假设上一次训练已经结束,但我们仍然希望从上次中断的地方继续:
To avoid repeating the first steps of the past conversation, we’re reusing the content from the short-term memory session we just ran. Let’s assume the previous session is over, but we still want to continue from where we left off:
session01=response
print(session01)
输出结果包含我们短期记忆会话的最终响应:
The output contains the final response from our short-term memory session:
Hawaii's islands form from volcanic activity over a stationary hotspot beneath the moving Pacific Plate.
这过程中本节内容将建立在上一节课的基础上,类似于你与在线副驾驶隔了一段时间后重新进行对话:
The process in this section will build on the previous session, similar to how you’d revisit a conversation with an online copilot after some time away:
保存上一个会话 => 加载上一个会话 => 将其添加到新会话的场景中
Save previous session => Load previous session => Add it to the new session’s scenario
我们先来测试一下 API 是否能够自行记住任何信息:
Let’s first test whether the API remembers anything on its own:
uinput="Is it safe to go there on vacation"
response = openai_api.make_openai_api_call(
uinput,mrole,mcontent,user_role
)
print(response)
输出结果显示它忘记了我们之前的对话:
The output shows that it forgot the conversation we were in:
I'm sorry, but I need more information to provide a helpful response. Could you specify the location you're considering for your vacation? …
由于无状态 API 不会保留过去的对话,因此 API 会忘记之前的调用。我们需要决定提示信息中包含哪些内容。我们有以下几种选择:
The API forgot the previous call because stateless APIs don’t retain past dialogue. It’s up to us to decide what to include in the prompt. We have a few choices:
在实际的GenAISys系统中,当输入触发请求时,AI控制器会决定对任务应用哪种策略是最佳的。代码现在会将上一次会话的上下文和记忆与新的请求关联起来:
In a real GenAISys, when an input triggers a request, the AI controller decides which is the best strategy to apply to a task. The code now associates the previous session’s context and memory with a new request:
ninput = "Let's continue our dialog."
uinput=ninput + session01 + "Would it be safe to go there on vacation?"
response = openai_api.make_openai_api_call(
uinput,mrole,mcontent,user_role
)
print("Dialog:", uinput,"\n")
print("Response:", response)
回应这表明该系统现在能够记住过去。会话包含足够的信息,可以提供可接受的输出:
The response shows that the system now remembers the past session and has enough information to provide an acceptable output:
Response: Hawaii is generally considered a safe destination for vacation, despite its volcanic activity. The Hawaiian Islands are formed by a hotspot beneath the Pacific Plate, which creates volcanoes as the plate moves over it. While volcanic activity is a natural and ongoing process in Hawaii, it is closely monitored by the United States Geological Survey (USGS) and other agencies…
现在让我们构建一个涵盖不同主题的多节课程的长期模拟。
Let’s now build a long-term simulation of multiple sessions across different topics.
这本节演示如何将两个独立的会话合并为一个。这不是这是标准 ChatGPT 类平台提供的功能之一。通常情况下,当我们发起一个新话题时,助手只会记住当前会话中发生的事情。但在企业环境中,我们可能需要更大的灵活性——尤其是在多个用户协作时。在这种情况下,可以配置 AI 控制器,允许用户组查看和合并同一组内其他用户创建的会话。
This section illustrates how to merge two separate sessions into one. This isn’t something standard ChatGPT-like platforms offer. Typically, when we start a new topic, the copilot only remembers what’s happened in the current session. But in a corporate environment, we may need more flexibility—especially when multiple users are collaborating. In such cases, the AI controller can be configured to allow groups of users to view and merge sessions generated by others in the same group.
假设我们要总结两段不同的对话——一段是关于夏威夷火山系统的,另一段是关于组织一次去亚利桑那州的地质考察。我们首先保存上一次的长期记忆会话:
Let’s say we want to sum up two separate conversations—one about Hawaii’s volcanic systems, and another about organizing a geological field trip to Arizona. We begin by saving the previous long-term memory session:
session02=uinput + response
print(session02)
然后我们可以从另一个地点(亚利桑那州)启动一个单独的多用户子会话:
Then we can start a separate multi-user sub-session from another location, Arizona:
ninput ="I would like to organize a geological visit in Arizona."
uinput=ninput+"Where should I start?"
response = openai_api.make_openai_api_call(
uinput,mrole,mcontent,user_role
)
#print("Dialog:", uinput,"\n")
我们现在期待收到关于亚利桑那州的回复,夏威夷除外:
We now expect a response on Arizona, leaving Hawaii out:
Response: Organizing a geological visit in Arizona is a fantastic idea, as the state is rich in diverse geological features. Here's a step-by-step guide to help you plan your trip:…
这回应可以接受。现在,我们开始吧。通过结合两个会话并提示系统进行总结,模拟跨多个主题的长期记忆:
The response is acceptable. Now, let’s simulate long-term memory across multiple topics by combining both sessions and prompting the system to summarize them:
session02=response
ninput="Sum up this dialog in a short paragraph:"
uinput=ninput+ session01 + session02
response = openai_api.make_openai_api_call(
uinput,mrole,mcontent,user_role
)
#print("Dialog:", uinput,"\n")#optional
print("Response:", response)
系统的输出表明系统的长期记忆功能有效。我们看到第一部分是关于夏威夷的:
The system’s output shows that the long-term memory of the system is effective. We see that the first part is about Hawaii:
Response: The dialog begins by explaining the formation of Hawaii's volcanic islands as the Pacific Plate moves over a stationary hotspot, leading to active volcanoes like Kilauea….
然后,回应继续涉及亚利桑那州的部分:
Then the response continues to the part about Arizona:
It then transitions to planning a geological visit to Arizona, emphasizing the state's diverse geological features. The guide recommends researching key sites such as the Grand Canyon…
我们已经介绍了GenAISys的核心存储模式——从无状态短期记忆到多用户、多主题长期记忆。现在让我们总结一下本章内容,并进入下一阶段!
We’ve now covered the core memory modes of GenAISys—from stateless and short-term memory to multi-user, multi-topic long-term memory. Let’s now summarize the chapter’s journey and move to the next level!
一个可用于商业用途的 GenAISys 系统提供的功能与 ChatGPT 等平台不相上下。它整合了生成式 AI 模型、智能体特征、红绿灯算法 (RAG)、记忆保持以及一系列机器学习和非 AI 功能——所有这些都由一个 AI 控制器协调执行。与传统流程不同,该控制器不遵循固定的步骤顺序,而是根据上下文动态地协调任务。
A business-ready GenAISys offers functionality on par with ChatGPT-like platforms. It brings together generative AI models, agentic features, RAG, memory retention, and a range of ML and non-AI functions—all coordinated by an AI controller. Unlike traditional pipelines, the controller doesn’t follow a fixed sequence of steps. Instead, it orchestrates tasks dynamically, adapting to the context.
GenAISys 通常运行在 GPT-4o 等模型上,或者任何最适合您用例的模型上。但正如我们所见,仅仅拥有 API 访问权限是不够的。上下文感知和记忆保持至关重要。虽然类似 ChatGPT 的工具默认提供这些功能,但在创建自定义系统时,我们需要自行构建这些功能。
A GenAISys typically runs on a model such as GPT-4o—or whichever model best fits your use case. But as we’ve seen, just having access to an API isn’t enough. Contextual awareness and memory retention are essential. While ChatGPT-like tools offer these features by default, we have to build them ourselves when creating custom systems.
我们探索了四种记忆类型:无记忆记忆、短期记忆、长期记忆和跨主题记忆。我们还区分了语义记忆(事实)和情景记忆(个人化的、带有时间戳的信息)。情境感知高度依赖于记忆,但情境窗口是有限的。即使我们增大窗口大小,模型仍然可能忽略复杂任务中的细微差别。这时,高级 RAG 算法就派上了用场——它将内容分解成更小的块,嵌入其中,并存储在诸如 Pinecone 之类的向量存储中。这扩展了系统能够“记住”并用于推理的内容。
We explored four types of memory: memoryless, short-term, long-term, and cross-topic. We also distinguished semantic memory (facts) from episodic memory (personal, time-stamped information). Context awareness depends heavily on memory—but context windows have limits. Even if we increase the window size, models can still miss the nuance in complex tasks. That’s where advanced RAG comes in—breaking down content into smaller chunks, embedding them, and storing them in vector stores such as Pinecone. This expands what the system can “remember” and use for reasoning.
我们还发现,无论 GenAISys 技术多么先进,都离不开人类的专业知识。从设计到部署、维护和迭代,人的因素在系统的整个生命周期中都至关重要。随后,我们根据可用资源和目标,概述了三种实际应用模型:利用现有 AI 平台的混合系统、满足特定业务需求的小规模系统,以及旨在达到 ChatGPT 级别性能的全规模系统。
We also saw that no matter how advanced GenAISys becomes, it can’t function without human expertise. From design to deployment, maintenance, and iteration, people remain critical throughout the system’s life cycle. We then outlined three real-world implementation models based on available resources and goals: hybrid systems that leverage existing AI platforms, small-scale systems for targeted business needs, and full-scale systems built for ChatGPT-grade performance.
最后,我们开始动手实践——使用 GPT-4o 在 Python 中构建了一系列内存模拟模块。这些示例为接下来的工作奠定了基础:构建一个 AI 控制器,用于管理 GenAISys 系统中的内存、上下文和编排。现在,我们已经准备好构建 GenAISys AI 控制器了!
Finally, we got hands-on—building a series of memory simulation modules in Python using GPT-4o. These examples laid the groundwork for what comes next: the AI controller that will manage memory, context, and orchestration across your GenAISys. We are now ready to build a GenAISys AI controller!
立即解锁本书的专属权益扫描此二维码或访问packtpub.com/unlock,然后按书名搜索本书。 |
|
|
注意:开始之前请准备好您的采购发票。 |
Unlock this book’s exclusive benefits nowScan this QR code or go to packtpub.com/unlock, then search for this book by name. |
|
|
Note: Keep your purchase invoice ready before you start. |
生成式人工智能系统(GenAISys )的控制器需要两个关键组件:对话代理和协调器。对话代理由生成式人工智能驱动。人工智能模型负责与人类用户和系统流程进行交互。而编排器则是一系列生成式人工智能和非人工智能功能的集合,例如管理用户角色、生成内容、激活机器学习算法以及运行经典查询。我们需要这两者才能构建一个功能完善的生成式人工智能系统(GenAISys)。
A generative AI system (GenAISys)’s controller requires two key components: a conversational agent and an orchestrator. The conversational agent—powered by a generative AI model—interacts with human users and system processes. The orchestrator, on the other hand, is a set of generative AI and non-AI functions, such as managing user roles, content generation, activating machine learning algorithms, and running classical queries. We need both to build a functional GenAISys.
如果我们仔细研究这种架构,就会发现软件编排器和用户界面可以追溯到第一台计算机。任何操作系统,即使是功能最基本的,都包含用于触发磁盘空间警报、内存使用情况警报以及数百种其他功能的编排器。如今的用户界面直观易用,并具备事件驱动功能,但从宏观层面来看,GenAISys 的底层架构仍然沿袭了数十年来软件设计的基本原则。那么,传统软件控制器与 GenAISys 控制器究竟有何区别呢?
If we examine this architecture closely, we’ll see that software orchestrators and user interfaces date back to the first computers. Any operating system, with even basic functionality, has orchestrators that trigger disk space alerts, memory usage, and hundreds of other functions. Today’s user interfaces are intuitive and have event-driven functionality, but at a high level, the underlying architecture of a GenAISys still echoes decades of software design principles. So, what sets a classical software controller apart from a GenAISys controller?
我们可以用一个词来概括二者的区别:适应性。在传统的软件控制器中,任务序列或多或少是硬编码的。但在 GenAISys 中,用户界面是一个灵活的对话式 AI 代理,其背后的生成式 AI 模型经过预训练,无需额外编码即可响应各种请求。此外,编排器也不受限于静态流程;它可以根据用户(人或系统)的提示来调整触发的任务。
We can sum up the difference in one word: adaptability. In a classical software controller, a sequence of tasks is more or less hardcoded. But in a GenAISys, the user interface is a conversational AI agent that is flexible, and the generative AI model behind it is pre-trained to respond to a wide range of requests with no additional coding. Furthermore, the orchestrator isn’t locked into static flows either; it can modify the tasks it triggers based on the user (human or system) prompts.
在本章中,我们将采用实践的方式,基于上一章定义的 GenAISys 架构,构建一个自定义的 GenAISys。首先,我们将用 Python 定义 AI 控制器的结构,将其分为两部分——对话代理和编排器——并探讨二者之间的交互方式。然后,我们将使用 GPT-4o 构建对话代理。我们将实现第一章中提到的上下文感知和记忆保持功能。我们的系统将支持短期记忆和长期记忆,以及多用户和跨会话功能——这超越了标准副驾驶系统通常提供的功能。
In this chapter, we’ll take a hands-on approach to building a custom GenAISys based on the architecture of a GenAISys defined in the previous chapter. We’ll begin by defining the structure of our AI controller in Python, breaking it into two parts—the conversational agent and the orchestrator—and exploring how the two interact. Then, we’ll build the conversational agent using GPT-4o. We’ll automate the contextual awareness and memory retention features from Chapter 1. Our system will support both short-term and long-term memory, as well as multi-user and cross-session capabilities—pushing it beyond what standard copilots typically offer.
最后,我们将构建一个AI控制器的架构,用于解读用户输入并触发相应的响应场景。响应内容将是情感分析或语义(硬科学)分析,具体取决于AI控制器需要分析和管理的内容。我们定制的GenAISys将为特定领域的RAG(红绿灯)功能奠定基础,这是标准的ChatGPT级系统在处理海量数据时无法实现的,尤其是在每日更新的数据集情况下,例如产品或服务的每日销售数据。在本章结束时,您将了解如何构建GenAISys AI控制器的基础架构,我们将在本书后续章节中对其进行完善。
Finally, we will build the structure of an AI controller to interpret user input and trigger a response scenario. The response will be a sentiment analysis or a semantic (hard science) analysis, depending on the context of what the AI controller will analyze and manage. Our custom GenAISys will lay the groundwork for domain-specific RAG, something a standard ChatGPT-grade system can’t offer when you’re working with large volumes of data, especially in cases of daily dataset updates, such as the daily sales of a product or service. By the end of this chapter, you’ll know how to build the foundations of a GenAISys AI controller that we will enhance throughout the book.
总而言之,本章涵盖以下主题:
To sum up, this chapter covers the following topics:
我们先来定义一下人工智能控制器的架构。
Let’s begin by defining the architecture of the AI controller.
我们将继续为了实现我们在第 1 章图 1.1中定义的 GenAISys 架构,图2.1则进一步展示了 GenAISys 的底层功能。
We’ll continue to implement the architecture of GenAISys as we’ve defined in Figure 1.1 from Chapter 1. Figure 2.1, on the other hand, takes us further into the underlying functions of a GenAISys.
图 2.1:定义要构建的函数
Figure 2.1: Defining the functions to build
我们在前一章中已经确定,人类的角色至关重要,前面的图表也印证了这一点。无论构建模块(模型或框架)多么先进,我们都是全人类人工智能系统的核心。我们的首要任务是运用人类的创造力进行设计,以找到……如何有效地实现 GenAISys 控制器?GenAISys 需要人类的创造力、判断力和技术决策能力。像 ChatGPT、Gemini 和 Microsoft Copilot 这样的无缝辅助驾驶系统背后,隐藏着错综复杂的 AI 和非 AI 逻辑层。如果我们想要构建自己的类似 ChatGPT 的系统,就必须依靠人类来完成这些繁重的工作!
We established in the previous chapter that human roles are essential, and the preceding figure acknowledges that fact. We are the core of a GenAISys, no matter how advanced the building blocks (models or frameworks) are. Our first task is designing using our human creativity to find effective ways to implement a GenAISys controller. GenAISys needs human creativity, judgment, and technical decision-making. Under the hood of seamless copilots such as ChatGPT, Gemini, and Microsoft Copilot lie intricate layers of AI and non-AI logic. If we want to build our own ChatGPT-like system, we humans need to do the heavy lifting!
我们将开发两个独立的程序:
We will build two separate programs:
本章我们将重点讨论两种场景:情感分析和语义(硬科学)分析,它们分别对应于我们架构中的函数F1和F2 。函数F4和F5将在第三章中添加。
In this chapter, we’ll focus on two scenarios: sentiment analysis and semantic (hard science) analysis, which correspond to functions F1 and F2 in our architecture. Functions F4 and F5 will be added in Chapter 3.
尽管这些示例是为 OpenAI 的 API 构建的,其逻辑与模型无关。一旦您理解了它的工作原理,就可以修改代码以使用任何 LLM(逻辑学习模型),例如 Meta 的 Llama、xAI 的 Grok、Google 的 Gemini 或 Cohere。
Although these examples are built for OpenAI’s API, the logic is model-agnostic. Once you understand how it works, you can adapt the code to use any LLM—such as Meta’s Llama, xAI’s Grok, Google’s Gemini, or Cohere.
一旦我们分别构建了对话代理和控制器协调程序,我们将把它们合并成一个统一的智能 AI 控制器,如图2.2所示。
Once we’ve built the conversational agent and controller orchestrator programs separately, we will merge them into a unified intelligence AI controller, as shown in Figure 2.2.
图 2.2:后续步骤——通过 Pinecone 向量存储集成 AI 控制器功能
Figure 2.2: Next steps—integrating the AI controller functions through a Pinecone vector store
目前,我们需要专注于分别构建每个组件,以便充分了解它们的行为。一旦基础搭建完成,我们将在第三章中通过 Pinecone 向量存储将它们合并起来。现在,让我们直接进入代码编写阶段,开始开发对话代理。
For now, we need to focus on building each component individually so we can fully understand their behavior. Once that foundation is in place, in Chapter 3, we will merge them through a Pinecone vector store. Let’s now dive straight down into code and begin developing the conversational agent.
我们俩本节的主要目标是构建一个具备以下功能的对话式人工智能代理:
Our two primary goals for this section are to build a conversational AI agent with the following:
首先,打开Conversational_AI_Agent.ipynb本章的 GitHub 目录(https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main)。本笔记本将指导您完成环境设置。
To get started, open Conversational_AI_Agent.ipynb in this chapter’s GitHub directory (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main). This notebook will guide you through the environment setup.
我们将重复使用上一章已经介绍过设置过程。如果您需要回顾一下,可以随时查看该部分。首先,请安装 OpenAI 并下载所需文件:
We’ll reuse the setup process from the previous chapter. If you need a refresher, feel free to revisit that section. Start by installing OpenAI and downloading the required files:
!curl -L https://raw.githubusercontent.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/master/commons/grequests.py --output grequests.py
from grequests import download
download("commons","requirements01.py")
download("commons","openai_setup.py")
download("commons","openai_api.py")
我们还需要下载两个额外的函数来构建我们的对话代理:
We’ll also need to download two additional functions to build our conversational agent:
download("commons","conversational_agent.py"):这包含了管理完整对话循环和记忆对话的功能。download("commons","conversational_agent.py"): This contains the functions to manage a full-turn conversation loop and memorize the dialogue.download("commons", "processing_conversations.py")该模块包含用于加载、显示和清理历史对话的工具,旨在提升对话代理在多个会话和用户间的记忆跨度。这种自定义的多会话、多用户功能超越了标准 ChatGPT 类辅助代理的功能范围。download("commons", "processing_conversations.py"): This contains tools to load, display, and cleanse past conversations to increase the memory span of the conversational agent across several sessions and users. This custom multisession, multi-user feature goes beyond the scope of standard ChatGPT-like copilots.现在让我们继续实现 中的函数conversational_agent.py,我们将在与对话式 AI 代理的整个会话过程中调用这些函数。
Let’s now move on to implementing the functions in conversational_agent.py, which we’ll call throughout our sessions with the conversational AI agent.
The conversation AI agent contains two main parts:
最初的对话标志着新会话的开始。它由人工智能控制器处理,如图 2.3所示。
The initial conversation marks the entry point for a new session. It’s handled by the AI controller and illustrated in Figure 2.3.
图 2.3:初始对话控制器
Figure 2.3: The initial conversation controller
我们将逐一分析与生成式人工智能模型进行初始对话的每个步骤,以详细了解类似 ChatGPT 的小型对话代理的工作原理。这 10 个步骤从“开始”开始。
We will go through each step of the initial conversation with the generative AI model to understand in detail how a small-scale ChatGPT-like conversational agent works. The 10-step process begins with Start.
这程序从这个入口点开始,通过run_conversational_agent函数 `in` 调用,该函数将在笔记本中由 `s`及其参数openai_api.py调用:conversational_agent
The program begins at this entry point through the run_conversational_agent function in openai_api.py, which will be called in the notebook by conversational_agent and its parameters:
# Start the conversational agent
def run_conversational_agent(
uinput, mrole, mcontent, user_role, user_name
):
conversational_agent(uinput, mrole, mcontent, user_role, user_name)
The parameters the conversational agent will process in this case are the following:
uinput:包含输入(用户或系统),例如,Where is Hawaii?。uinput: Contains the input (user or system), for example, Where is Hawaii?.mrole:定义消息的角色。它可以是user或system。您还可以分配 API 将解释的其他角色,例如定义 AI 的角色,例如You are a geology expert。mrole: Defines the role of the message. It can be user or system. You can also assign other roles that the API will interpret, such as defining the AI’s persona, for example, You are a geology expert.mcontent:例如,这是我们期望系统应该有的样子You are a geology expert。mcontent: Is what we expect the system to be, for example, You are a geology expert.user_role:定义用户的角色,例如,user。user_role: Defines the role of the user, for example, user.user_name例如,用户名John。user_name: The name of the user, for example, John.messages_obj已初始化根据上一步所述的对话参数,开始对话:
messages_obj is initialized with the parameters of the conversation described in the previous step, Starting the conversation:
messages_obj = [{"role": mrole, "content": mcontent}]
messages_obj正在集中精力该对象存储在系统内存中。只要与 GPT-4o 模型进行会话交流,该对象就会被添加到内存中。它将用于记录会话之间的对话。第一条消息包含用于设置代理上下文的角色和内容。
messages_obj is focusing on the memory of the system. This object will be appended as long as the session lasts with the exchanges with the GPT-4o model. It will be used to log conversations between sessions. The first message contains the role and content for setting up the agent’s context.
系统现在已准备好与用户互动。代理首先会显示欢迎信息,并说明对话结束后如何退出系统:
The system is now ready to interact with users. The agent first displays a welcome message and explains how to exit the system once the conversation is over:
print("Welcome to the conversational agent! Type 'q' or 'quit' to end the conversation.")
用户的初始输入会被添加到messages_obj代理的内存中,以便为其提供内存和信息。代理程序需要遵循的方向。初始用户输入将由对话代理程序发送:
The user’s initial input is added to messages_obj to provide the agent with memory and provide the direction the agent is expected to follow. The initial user input will be sent from the conversational agent:
if initial_user_input:
print(f"{user_name}: {initial_user_input}")
messages_obj.append(
{"role": user_role, "content": initial_user_input}
)
messages_obj以结构化格式保存对话历史记录。对于某些操作,在我们的应用程序中,例如生成简化显示、创建合并日志条目或为基于文本的函数准备输入时,我们需要将结构化日志转换为单个连续字符串。这可以确保数据格式符合这些特定任务的要求,并有助于解决合并不同消息部分时可能出现的任何标点符号或格式问题:
messages_obj holds the conversation’s history in a structured format. For certain operations within our application, such as generating a simplified display, creating a consolidated log entry, or preparing input for a text-based function, we need to convert this structured log into a single, continuous string. This makes sure that the data is in the correct format for these specific tasks and helps resolve any potential punctuation or formatting quirks that might arise when combining the different message parts:
conversation_string = cleanse_conversation_log(messages_obj)
清理函数会清理对话并返回一个字符串:
The cleansing function cleans the conversation and returns a string:
def cleanse_conversation_log(messages_obj):
conversation_str = " ".join(
[f"{entry['role']}: {entry['content']}" for entry in messages_obj]
)
# Remove problematic punctuations
return re.sub(r"[^\w\s,.?!:]", "", conversation_str)
清洗后的对话字符串被发送到 API 进行处理。API 会提供一个基于此的响应。系统会记录上次输入内容和对话历史记录。现在系统具备记忆功能:
The cleansed conversation string is sent to the API for processing. The API provides a response based on the last input and the conversation history. The system now has a memory:
agent_response = make_openai_api_call(
input=conversation_string,
mrole=mrole,
mcontent=mcontent,
user_role=user_role
)
这助手从 API 返回的响应将被处理并附加到系统中messages_obj。我们正在不断增加系统的内存,从而提升其上下文感知能力:
The assistant’s response from the API is processed and appended to messages_obj. We are continuing to increase the system’s memory and, thus, its contextual awareness:
messages_obj.append({"role": "assistant", "content": agent_response})
该系统的系统会显示响应结果供用户分析,并决定是继续还是退出会话:
The system’s response is displayed for the user to analyze and decide whether to continue or exit the session:
print(f"Agent: {agent_response}")
这系统现在进入对话循环,可以进行多轮对话,直到用户决定退出会话为止:
The system now enters the conversation loop, where multiple dialogue turns can take place until the user decides to exit the session:
while True:
user_input = input(f"{user_name}: ")
if user_input.lower() in ["q", "quit"]:
print("Exiting the conversation. Goodbye!")
break
现在我们准备开始一个完整的对话循环。
We are now ready to begin a full-turn conversation loop.
这初始对话现已启动。我们将从步骤 11开始进入完整的对话循环,如图2.4所示。
The initial conversation is now initialized. We will enter the full-turn conversation loop starting from step 11 onward, as illustrated in Figure 2.4.
图 2.4:从步骤 11 开始的对话循环
Figure 2.4: The conversation loop starting from step 11
这对话延续初始对话,并通过messages对象进行记忆。用户提示会触发完整的对话循环。第一步是输入用户名。这种自定义方式超越了标准 ChatGPT 类对话代理的局限,后者每个会话仅限一个用户。我们正在初始化一个多用户对话:
The conversation continues the initial dialogue and is memorized through the messages object. The user prompt triggers a full-turn conversation loop. The first step is to enter the user’s name. This custom takes us beyond the standard ChatGPT-like conversational agents that are limited to one user per session. We are initializing a multi-user conversation:
user_input = input(f"{user_name}: ")
If q or quit is entered, the session is ended:
if user_input.lower() in ["q", "quit"]:
print("Exiting the conversation. Goodbye!")
break
这系统现在配备了完整对话循环的记忆功能。它使用我们定义的通用 API 格式。用户的输入会附加到messages_obj:
The system is now equipped with a memory of a full-turn conversation loop. It uses the generic API format we defined. The user’s input is appended to messages_obj:
messages_obj.append({"role": user_role, "content": user_input})
这更新后的数据messages_obj会进行清理,以确保其符合 API 调用要求,如步骤 6所述,清理初始对话日志:
The updated messages_obj is cleansed to make sure it complies with the API calls, as in step 6, Cleansing the initial conversation log:
conversation_string = cleanse_conversation_log(messages_obj)
在在这个完整的对话循环中,整个对话都会发送到 API。API 将根据整个对话的上下文和新的输入返回响应:
In this full-turn conversation loop, the whole conversation is sent to the API. The API will thus return a response based on the context of the whole conversation and the new input:
agent_response = make_openai_api_call(
input=conversation_string,
mrole=mrole,
mcontent=mcontent,
user_role=user_role
)
这messages_objAPI 的响应会在每次对话回合中附加到后面:
The API’s response is appended to messages_obj at each conversation turn:
messages_obj.append({"role": "assistant", "content": agent_response})
The API response is displayed at each conversation turn in the loop:
print(f"Agent: {agent_response}")
当用户退出循环时,对话内容会被保存。此功能将模拟类似 ChatGPT 的功能。一个能够保存同一用户两次会话之间对话的平台。然而,由于在“运行对话代理”部分,我们将看到,在对话代理的实现过程中,我们的程序能够保存团队成员之间对话的多用户会话:
When a user exits the loop, the conversation is saved. This feature will replicate a ChatGPT-like platform that can save dialogue between two sessions with the same user. However, as we will see in our implementation of a conversational agent in the Running the conversational agent section, our program will be able to save a multi-user session in a conversation between team members:
with open("conversation_log.txt", "w") as log_file:
log_file.write("\n".join([f"{(user_name if entry['role'] == 'user' else entry['role'])}: {entry['content']}" for entry in messages_obj]))
The conversational agent terminates the session after memorizing the conversation:
print("Conversation saved to 'conversation_log.txt'.")
我们已经探索了对话代理的功能。
We have explored the conversational agent’s functionality.
现在,让我们继续讨论代表人工智能控制器的AI对话代理程序。
Now, let’s move on to the AI conversational agent program that represents an AI controller.
主程序Conversational_AI_Agent.ipynb从调用必要的函数conversational_agent.py到处理人工智能交互。我们将通过三个用户会话来模拟以下场景:
The main program, Conversational_AI_Agent.ipynb, calls the necessary functions from conversational_agent.py to handle AI interactions. We will be running a conversation through three user sessions with this scenario:
这三位用户在连续的会话中进行交互。在第三章中,我们将通过对用户进行分组来进一步探讨这个问题。我们使用 Pinecone 向量存储,以便多个用户可以实时参与同一会话。现在,让我们逐步了解这种多用户设置,看看对话式 AI 代理如何处理这些会话。首先,我们来看第一步:John 的短期记忆会话。
All three users interact in successive sessions. In Chapter 3, we’ll go further by grouping users through a Pinecone vector store so that multiple users can participate together in a session in real time. For the moment, let’s walk through this multi-user setup step by step and see how the conversational AI agent handles these sessions. Let’s begin with the first step: John’s short-term memory session.
这会话开始时,会使用对话代理步骤 1“开始对话”中描述的参数:
The session begins with the parameters described in step 1, Starting the conversation, of the conversational agent:
uinput = "Hawai is on a geological volcano system. Explain:"
mrole = "system"
mcontent = "You are an expert in geology."
user_role = "user"
我们还会像在 ChatGPT 会话中那样添加用户名:
We are also adding the name of the user like in a ChatGPT-like session:
user_name = "John"
正是这一简单的改进,user_name使我们的 GenAISys 超越了标准的 ChatGPT 类平台。它允许我们将记忆与特定用户关联起来,并在单个系统中扩展到多用户对话。
This simple addition—user_name—is what takes our GenAISys beyond standard ChatGPT-like platforms. It allows us to associate memory with specific users and expand into multi-user conversations within a single system.
现在我们将导入第一个函数,即 OpenAI API 功能,以便向 OpenAI 的 API 发出请求,如第 1 章所述:
We will now import the first function, the OpenAI API functionality, to make a request to OpenAI’s API, as described in Chapter 1:
from openai_api import make_openai_api_call
程序现在导入第二个函数,即对话代理,并按照本节前面所述的方式运行它:
The program now imports the second function, the conversational agent, and runs it as described earlier in this section:
from conversational_agent import run_conversational_agent
run_conversational_agent(uinput, mrole, mcontent, user_role,user_name)
让我们逐一了解用这两个函数实现的对话过程。首先,代理会欢迎我们:
Let’s go through each step of the dialog implemented with our two functions. The agent first welcomes us:
Welcome to the conversational agent! Type 'q' or 'quit' to end the conversation.
第一个用户约翰询问有关夏威夷地质的解释:
John, the first user, asks for a geological explanation about Hawaii:
John: Hawai is on a geological volcano system. Explain:
代理人给出了令人满意的答复:
The agent provides a satisfactory answer:
Agent: Hawaii is part of a geological volcanic system known as a "hotspot"…
约翰现在问起“那里”冲浪的事:
John now asks about surfing “there”:
John: Can we surf there?
谢谢由于我们已将记忆功能内置于智能体中,它现在通过记忆保留具备了情境感知能力。智能体能够正确回答有关在夏威夷冲浪的问题:
Thanks to the memory we built into the agent, it now has contextual awareness through memory retention. The agent correctly responds about surfing in Hawaii:
Agent: Yes, you can definitely surf in Hawaii! The Hawaiian Islands are renowned …
约翰现在询问最佳住宿地点,但没有提到夏威夷:
John now asks about the best places to stay without mentioning Hawaii:
John: Where are the best places to stay?
智能体利用上下文感知能力给出了正确的答案:
The agent answers correctly using contextual awareness:
Agent: Hawaii offers a wide range of accommodations …
约翰随后退出了会议:
John then quits the session:
John: quit
代理退出对话并将对话内容保存到对话日志中:
The agent exits the conversation and saves the dialogue in a conversation log:
Agent:Exiting the conversation. Goodbye!
Conversation saved to 'conversation_log.txt'.
短期会话结束了,但得益于记忆保留功能conversation_log.txt,我们可以轻松地从约翰中断的地方继续对话。因此,我们可以立即或稍后继续对话,利用conversation_log.txt自动生成的文件提供的记忆保留功能。
The short-term session ends, but thanks to memory retention via conversation_log.txt, we can easily pick up from where John left off. We can thus continue the dialogue immediately or at a later time, leveraging memory retention through the conversation_log.txt file that was automatically generated.
The short-term session is saved. We have three options:
conversation_log.txt程序将只包含约翰的会话,您可以选择继续或停止该会话。conversation_log.txt will only contain John’s session, which can be continued or not.conversation_log.txt决定为下一个用户 Myriam初始化一个单独的帐户。conversation_log.txt for the next user, Myriam.本章程序选择继续采用多会话、多用户场景。
The program in this chapter chooses to continue a multi-session, multi-user scenario.
继续与约翰对话的第一步是使用以下方式加载并显示对话日志:processing_conversations.py我们在“设置环境”部分下载了该函数。现在,我们导入并运行该函数,以加载和显示对话日志:
The first step to continue the conversation with John is to load and display the conversation log using the function in processing_conversations.py that we downloaded in the Setting up the environment section. We now import and run the function that we need to load and display the conversation log:
from processing_conversations import load_and_display_conversation_log
conversation_log = load_and_display_conversation_log()
该功能是一个IPython使用 HTML 功能的标准流程,用于读取和显示对话:
The function is a standard IPython process using HTML functionality that reads and displays the conversation:
from IPython.core.display import display, HTML
import re
# Step 1: Load and Display Conversation Log
def load_and_display_conversation_log():
try:
with open("conversation_log.txt", "r") as log_file:
conversation_log = log_file.readlines()
# Prepare HTML for display
html_content = "<h3>Loaded Conversation Log</h3><table border='1'>"
for line in conversation_log:
html_content += f"<tr><td>{line.strip()}</td></tr>"
html_content += "</table>"
# Display the HTML
display(HTML(html_content))
return conversation_log
except FileNotFoundError:
print("Error: conversation_log.txt not found. Ensure it exists in the current directory.")
return []
输出结果显示了对话中的每个参与者,首先是系统的信息,然后是约翰的请求,最后是 GPT-4o 助手在每个回合的回应:
The output displays each participant in the conversation, beginning with the system’s information, followed by John’s request, and then the GPT-4o assistant’s response at each turn:
system: You are an expert in geology.
John: Hawai is on a geological volcano system. Explain:
assistant: Hawaii is part of a geological volcanic system…
添加为了将对话内容与下一个输入内容关联起来,我们将对其进行清理和准备。为此,我们依次导入cleanse_conversation_log和initialize_uinput从以下位置导入processing_conversations.py:
Before adding the conversation to the context of the next input, we will clean and prepare it. To achieve this, we successively import cleanse_conversation_log and import initialize_uinput from processing_conversations.py:
from processing_conversations import cleanse_conversation_log
from processing_conversations import initialize_uinput
然后,我们将调用之前定义的两个 Python 函数来清理数据,并准备新的输入:
Then, we will call the two Python functions that we defined to cleanse and then prepare the new input:
cleansed_log = cleanse_conversation_log(conversation_log)
nuinput = initialize_uinput(cleansed_log)
The cleanse function removes punctuation and potentially problematic characters:
# Step 2: Clean the conversation log by removing punctuations and special characters
def cleanse_conversation_log(conversation_log):
cleansed_log = []
for line in conversation_log:
# Remove problematic punctuations and special characters
cleansed_line = re.sub(r"[^\w\s,.?!:]", "", line)
cleansed_log.append(cleansed_line.strip())
return " ".join(cleansed_log) # Combine all lines into a single string
最后,我们初始化新的输入:
Finally, we initialize the new input:
# Step 3: Initialize `uinput` with the cleansed conversation log to continue the conversation
def initialize_uinput(cleansed_log):
if cleansed_log:
print("\nCleansed conversation log for continuation:")
print(cleansed_log)
return cleansed_log # Use the cleansed log as the new input
else:
print("Error: No data available to initialize `uinput`.")
return ""
The output confirms that the conversation log has been cleansed:
Cleansed conversation log for continuation:
system: You are an expert in geology…
然后,输出结果确认其中nuinput包含对话日志,以便继续:
Then, the output confirms that nuinput contains the conversation log for continuation:
# `nuinput` now contains the cleansed version of the conversation log and can be used
print("\nInitialized `nuinput` for continuation:", nuinput)
现在我们可以继续约翰提出的关于情境感知记忆保持变量的讨论。我们将像之前一样,使用消息变量nuinput将情境添加到米里亚姆的请求中:nuinput
We can now continue the conversation that John began with nuinput as the memory retention variable for contextual awareness. We will add the context, nuinput, to Myriam’s request using the message variables as before:
ninput = nuinput+ "What about surfing in Long Beach"
mrole = "system"
mcontent = "You are an expert in geology."
user_role = "user"
user_name = "Myriam"
消息调用包含两个主要特征:
The message call contains two key features:
ninput = nuinput+ [user input]这表明人工智能控制器现在拥有超越单次会话的长期记忆。ninput = nuinput+ [user input], which shows that the AI controller now has a long-term memory that goes beyond a single sessionuser_name = "Myriam"这展示了多用户功能,证明我们定制的小型 ChatGPT 类 AI 控制器比标准副驾驶更灵活。user_name = "Myriam", which shows the multi-user feature, proving that our custom small-scale ChatGPT-like AI controller has more flexibility than a standard copilot整个过程与约翰的情况相同。米里亚姆问了一个问题:
The overall process is the same as with John. Myriam asks a question:
Myriam: What about surfing in Long Beach
代理人回应:
The agent responds:
Agent:Long Beach, California, offers a different surfing experience compared to Hawai…
米里亚姆辞职了:
Myriam quits:
Myriam: quit
The agent confirms that the conversation has ended and is saved to the conversation log:
Agent:Exiting the conversation. Goodbye!
Conversation saved to 'conversation_log.txt'.
AI控制器现在记录了John的会话以及Myriam继续该会话的过程。控制器可以进一步将另一位用户添加到对话中。
The AI controller now has a log of John’s session and Myriam’s continuation of the session. The controller can take this further and add yet another user to the conversation.
我们开始吧让鲍勃加入对话,继续进行。首先,再次显示对话记录:
Let’s add Bob to the mix to continue the conversation. First, display the conversation log again:
# Run the process
conversation_log = load_and_display_conversation_log()
你会看到约翰和米里亚姆的条目:
You’ll see entries for both John and Myriam:
system: You are an expert in geology.
Myriam: system: You are an expert …
然后,像之前一样,对日志进行清理,为下一轮对话做好准备。nuinput现在,日志中包含了约翰和米里亚姆的对话:
The log is then cleansed and prepared for the next turn of the conversation as previously. nuinput now contains John and Myriam’s sessions:
uinput =nuinput+ "Read the whole dialog then choose the best for geology research"
mrole = "system"
mcontent = "You are an expert in geology."
user_role = "user"
user_name = "Bob"
鲍勃专注于地质勘探任务,无暇顾及休闲娱乐:
Bob is focused on the geological mission, not leisure:
Bob:"Read the whole dialog then choose the best for geology research"
人工智能代理提供了准确的答案:
The AI agent provides an accurate response:
Agent: For geology research, the most relevant part of the dialogue is the explanation of Hawaii's geological volcanic system. This section provides detailed insights into the Hawaiian hotspot, mantle plumes, volcanic activity,…
随后,鲍勃退出了会话:
Bob then quits the session:
Bob: quit
The agent exits the conversation and saves it in the conversation log:
Agent:Exiting the conversation. Goodbye!
Conversation saved to 'conversation_log.txt'.
通过这三个场景,我们已经实现了一个由AI控制器管理的多用户全回合对话循环中的对话代理。接下来,我们来看看这个对话代理的后续步骤。
With these three scenarios, we have implemented a conversational agent managed by the AI controller in a multi-user full-turn conversational loop. Let’s examine the next steps for this conversational agent.
至此,我们已经有了对话代理的基本架构。接下来需要将其集成到AI控制器编排器中。在开始构建AI控制器编排器之前,让我们先总结一下我们为对话代理所做的工作。
At this point, we have the basic structure of a conversational agent. We need to integrate it into an AI controller orchestrator. Let’s sum up the work we did for the conversational agent before beginning to build the AI controller orchestrator.
图 2.5:对话代理循环
Figure 2.5: The cycle of a conversational agent loop
As illustrated in the preceding figure, the AI conversation agent does the following:
然而,目前的入口/出口机制尚不完善。我们可以进入和退出对话,但无法调用函数来协调诸如激活情感分析和语义分析之类的任务。为了完善AI控制器的架构,我们需要开始构建AI控制器协调器。
However, the entry/exit point is incomplete. We can enter and exit the conversation but cannot call functions to orchestrate tasks such as activating sentiment analysis and semantic analysis. To complete the architecture of the AI controller, we need to begin building the AI controller orchestrator.
在本节中,我们将构建人工智能控制器协调器的第一个组件:选择能力。执行正确的任务。我们将此组件开发为一个独立组件,并从第 3 章开始集成。在第 3 章中,我们将通过 Pinecone 向量存储将对话代理与 AI 控制器协调器连接起来。
In this section, we will build the first component of our AI controller orchestrator: the ability to select the right task to perform. We develop this component as a standalone component that we will integrate starting from Chapter 3, where we will bridge the conversational agent with the AI controller orchestrator through a Pinecone vector store.
Figure 2.6 illustrates the workflow of the AI controller orchestrator we’ll be developing:
图 2.6:AI 控制器协调器的工作流程
Figure 2.6: Workflow of the AI controller orchestrator
我们将首先开发这个。该组件是使用 OpenAI 的 GPT-4o API 和 Python 实现的 AI 控制器编排器。此外,由于其理念是充分利用生成式 AI 模型的强大功能来执行 AI 控制器编排器请求的多个任务,因此我们将避免使用额外的库来增加编排器的负担,从而专注于 GenAISys 的架构。
We’ll develop this first component of the AI controller orchestrator with OpenAI’s GPT-4o API and Python. Additionally, since the idea is to leverage the full power of the generative AI model to perform several tasks requested by the AI controller orchestrator, we will thus avoid overloading the orchestrator with additional libraries to focus on the architecture of the GenAISys.
在本笔记本中,GPT-4o 将在程序中执行三个关键功能,如图2.7所示:
In this notebook, GPT-4o will perform three key functions in the program, as shown in Figure 2.7:
图 2.7:通过指令列表中的相似性搜索触发任务
Figure 2.7: Triggering tasks with similarity searches in a list of instructions
我们已经定义了协调器的工作流程和生成式人工智能模型的使用方法。但是,我们必须研究模型如何识别它需要执行的任务。
We have defined the workflow of the orchestrator and the generative AI model’s usage. However, we must examine how a model identifies the task it is expected to perform.
无论像 GPT-4o 这样的生成式人工智能模型多么强大,它也无法在没有用户反馈的情况下猜测用户想要什么。我们需要一个明确表达意图的提示。我们不能只是说“大峡谷是亚利桑那州一个很棒的旅游景点”,就指望模型能猜到我们想对这句话进行情感分析。我们必须明确地表达我们的意图,例如输入:“请对以下文本进行情感分析:大峡谷是亚利桑那州一个很棒的旅游景点。”
No matter how powerful a generative AI model such as GPT-4o is, it cannot guess what a user wants without a prompt that explicitly expresses intent. We cannot just say, “The Grand Canyon is a great place to visit in Arizona” and expect the model to guess that we want a sentiment analysis done on our statement. We have to explicitly formulate our intent by entering: “Provide a sentiment analysis of the following text: The Grand Canyon is a great place to visit in Arizona.”
为了解决人工智能控制器的意图问题,我们必须找到一个框架来协调它的行为。任务。一个好的起点是学习文本到文本转换转换器(T5),它是一个文本到文本的模型(Raffel et al., 2020)。T5 模型使用任务标签或任务特定的前缀来向转换器模型提供提示的意图。任务标签包含诸如摘要、翻译和分类之类的指令。如图 2.8所示,模型将检测标签并知道该做什么。
To resolve the issue of intent for an AI controller, we have to find a framework for it to orchestrate its tasks. A good place to start is to study the Text-to-Text Transfer Transformer (T5), which is a text-to-text model (Raffel et al., 2020). A T5 model uses task tags or task-specific prefixes to provide the intent of a prompt to the transformer model. A task tag contains instructions such as summarization, translation, and classification. The model will detect the tag and know what to do, as shown in Figure 2.8.
图 2.8:带有任务标签的 T5
Figure 2.8: T5 with task tags
训练 T5 模型需要在创建输入时显式添加任务标签,然后提供然而,OpenAI GPT 模型通过分析数十亿条包含指令和响应的语言序列(而非显式结构)来学习执行哪个任务。因此,使用类似 GPT 架构的生成式 AI 模型可以通过提示的上下文隐式地学习执行哪个任务。例如,一个解析良好的提示,例如“对以下文本进行情感分析:大峡谷是亚利桑那州一个很棒的旅游胜地。”,就包含了足够的上下文,GPT-4o 可以推断出所需的操作,而无需显式的标签。
Training a T5 model involves explicitly adding a task tag when creating an input and then providing the response. However, OpenAI GPT models learn which task to perform by analyzing billions of sequences of language, not explicit structures, that contain instructions and responses. A generative AI model using GPT-like architectures will thus learn which task to perform implicitly through the context of the prompt. For example, a well-parsed prompt such as “Provide a sentiment analysis of the following text: The Grand Canyon is a great place to visit in Arizona.” contains enough context for GPT-4o to infer the desired operation—without requiring an explicit tag.
让我们通过运行 T5 风格的示例来说明 GPT 模型的工作原理,GPT-4o 会隐式分析需要执行的任务。
Let’s illustrate how a GPT model works by running T5-style examples with GPT-4o’s implicit analysis of which task needs to be performed.
在本节中,我们将编写一个程序来展示 GPT-4o 如何解释指令——这是我们将要介绍的一项功能。利用我们编排器中的杠杆作用。目的是证明,尽管 GPT 式模型能够隐式推断意图,但它们仍然需要明确的指令。
In this section, we’ll write a program to show how GPT-4o interprets instructions—a capability we’ll leverage in our orchestrator. The aim is to demonstrate that, although GPT-style models infer intent implicitly, they still need clear instructions.
首先,打开GitHub 上的T52GPT.ipynb目录Chapter02。按照“对话式 AI 代理”部分的“设置环境”小节中的步骤设置环境,仅安装 OpenAI 环境:
We’ll begin by opening T52GPT.ipynb in the Chapter02 directory on GitHub. Set up the environment exactly as in the Setting up the environment subsection of the Conversational AI agent section, installing only the OpenAI environment:
download("commons","requirements01.py")
download("commons","openai_setup.py")
download("commons","openai_api.py")
无需额外安装。现在我们开始执行 CoLA 任务。
No additional installations are required. Let’s now begin with a CoLA task.
语言可接受性语料库(CoLA )是一个公开的英语短句数据集,每个句子都经过标注。判断句子是否符合语法规则(语法正确)或不符合语法规则(语法错误)。通过对这些例子进行 GPT-4o 测试,我们可以证明高级语法能力。生成模型仅凭理解语言就能处理新任务,无需任何针对特定任务的微调。这意味着我们可以将先进的生成式人工智能模型应用于我们未曾训练过的各种任务。
The Corpus of Linguistic Acceptability (CoLA) is a public dataset of short English sentences, each tagged as acceptable (grammatical) or unacceptable (ungrammatical). By testing GPT-4o on these examples, we can show that advanced generative models can tackle new tasks purely by understanding language, without any task-specific fine-tuning. This means that we can apply advanced generative AI models to a wide range of tasks we didn’t train them for.
我们首先将以下输入提交给 GPT-4o 模型,看看在没有明确任务标签的情况下,它是否可以被接受:
Let’s first submit the following input to the GPT-4o model to see whether it is acceptable without an explicit task tag:
input="This aint the right way to talk."
我们将向系统提供最少的信息:
We will provide minimal information to the system:
mrole = "system"
user_role = "user"
mcontent = "Follow the instructions in the input"
我们还会使用本章一直使用的函数调用 OpenAI API:
We’ll also make an OpenAI API call with the function we have been using throughout this chapter:
# API function call
task_response = openai_api.make_openai_api_call(
input,mrole,mcontent,user_role
)
print(task_response)
输出结果显示,即使是最强大的生成式人工智能模型,如果没有任务标签,也不知道该做什么:
The output shows that even one of the most powerful generative AI models doesn’t have a clue about what to do without a task tag:
I apologize if my previous response didn't meet your expectations. Please let me know how I can assist you better!
现在,让我们编写一条带有任务标签且消息内容相同的指令:
Now, let’s write an instruction with a task tag and the same message:
input="Is the following sentence gramatically correct:This aint the right way to talk."
mrole = "system"
user_role = "user"
mcontent = "Follow the instructions in the input"
# API function call
task_response = openai_api.make_openai_api_call(
input,mrole,mcontent,user_role
)
print(task_response)
输入现在包含了对生成式人工智能模型预期结果的指示。输出结果现在准确无误:
The input now contains an indication of what is expected of the generative AI model. The output is now accurate:
The sentence "This aint the right way to talk." is not grammatically correct. The response corrects the sentence:
"This isn't the right way to talk."
Alternatively, if you want to maintain the informal tone, you could write:
"This ain't the right way to talk."
Note that "ain't" is considered informal and nonstandard in formal writing.
现在我们来执行一个翻译任务。
Let’s now perform a translation task.
The task begins with a task tag that is expressed in natural language:
input=”Translate this sentence into French: Paris is quite a city to visit.”
mrole = “system”
user_role = “user”
mcontent = “Follow the instructions in the input”
# API function call
task_response = openai_api.make_openai_api_call(
input,mrole,mcontent,user_role
)
print(task_response)
我们得到的结果是正确的:
The output we get is accurate:
Paris est vraiment une ville à visiter.
Let’s now perform a Semantic Textual Similarity Benchmark (STSB) task.
STSB 式评分是 GenAISys AI 控制器的一项重要特性,该控制器依赖于相似性搜索来选择合适的指令场景、文档和其他资源。编曲家将依赖于这一点。能力。在接下来的测试中,我们将两个句子提交给模型,并要求它判断它们的语义相似度:
STSB-style scoring is an important feature for a GenAISys AI controller, which depends on similarity searches to pick the right instruction scenarios, documents, and other resources. The orchestrator will rely on this very capability. In the test that follows, we submit two sentences to the model and ask it to judge their semantic similarity:
input=”stsb:Sentence 1: This is a big dog. Sentence 2: This dog is very big.”
mrole = “system”
user_role = “user”
mcontent = “Follow the instructions in the input”
# API function call
task_response = openai_api.make_openai_api_call(
input,mrole,mcontent,user_role)
print(task_response)
我们得到的结果是正确的:
The output we get is accurate:
The sentences "This is a big dog." and "This dog is very big." are semantically similar. Both sentences convey the idea that the dog in question is large in size. The difference in wording does not significantly alter the meaning, as both sentences describe the same characteristic of the dog.
当我们在数据集中查找与输入匹配的数据时,这个函数将非常有用。现在让我们运行一个摘要任务。
This function will prove to be very useful when we’re searching for data that matches the input in a dataset. Let’s now run a summarization task.
在根据输入,GPT-4o 可以检测到摘要指令标签,并能解释所需的最大响应长度:
In the following input, GPT-4o can detect the summarization instruction tag and also interpret the maximum length of the response required:
input="Summarize this text in 10 words maximum: The group walked in the forest on a nice sunny day. The birds were singing and everyone was happy."
mrole = "system"
user_role = "user"
mcontent = "Follow the instructions in the input"
# API function call
task_response = openai_api.make_openai_api_call(
input,mrole,mcontent,user_role)
print(task_response)
输出结果再次准确无误:
The output is once again accurate:
Group enjoyed a sunny forest walk with singing birds.
要点本次探索的关键在于,无论我们实现哪种生成式人工智能模型,都需要任务标签才能按预期做出反应。接下来,我们将利用这一洞见,在任务标签处理协调器中实现语义文本相似性分析。
The takeaway of this exploration is that no matter which generative AI model we implement, it requires task tags to react as we expect. Next, we’ll use this insight to implement semantic textual similarity in our orchestrator for processing task tags.
在本节中,我们将开始构建基于任务标签的两条指令的编排器,如图 2.9所示:一条是情感分析指令,用于确定句子的情感;另一条是语义指令。分析,即分析句子中的事实。
In this section, we will begin building the orchestrator for two instructions based on task tags, as shown in Figure 2.9: sentiment analysis to determine the sentiment of a sentence and semantic analysis to analyze the facts in a sentence.
我们将通过要求生成式人工智能模型根据输入内容找到最佳任务标签方案(情感分析或语义分析)来增加系统的复杂性。换句话说,任务标签本身并不包含在输入内容中。我们将使用 GPT-4o 的语义文本相似性特征来自动选择合适的任务标签。
We will make the system more complex by asking the generative AI model to find the best task tag scenario (sentiment or semantic analysis) based on the input. In other words, the task tag will not be part of the input. We will use GPT-4o’s semantic textual similarity features to choose the right task tag itself.
图 2.9:运行带有隐式任务标签的任务
Figure 2.9: Running tasks with implicit task tags
最终,我们的协调器将支持任何任务(参见图 2.9中的 3. 任何任务),而不仅仅是情感或语义分析。
Eventually, our orchestrator will support any task (see 3. Any Task required in Figure 2.9), not just sentiment or semantic analysis.
环境设置与之前相同:
Setting up the environment is the same as earlier:
download("commons","requirements01.py")
download("commons","openai_setup.py")
download("commons","openai_api.py")
No additional installations are required for the orchestrator. We will begin by implementing an instruction scenario selection.
人工智能控制器的核心在于决定在接收到输入(系统输入或用户输入)时应该执行什么操作。完成一项任务会开启一个充满各种可能方法的世界,我们将在本书中逐一探讨。然而,我们可以将它们分为两类:
The core of an AI controller is to decide what to do when it receives an input (system or human user). The selection of a task opens a world of possible methods that we will explore throughout the book. However, we can classify them into two categories:
接下来,我们将探讨第二种更主动的方法。我们将测试两个不带任何指令、任务标签,也没有任何关于生成式 AI 模型预期结果的提示。虽然我们稍后会实现其他带有任务标签的更明确的方法,但 GenAISys AI 控制器编排器必须能够在某些情况下主动应对。
Here, we’ll explore the second, more proactive approach. We’ll test two prompts with no instructions, no task tag, and no clue as to what is expected of the generative AI model. Although we will implement other, more explicit approaches later with task tags, a GenAISys AI controller orchestrator must be able to be proactive in certain situations.
if prompt==1:
input = "Gladiator II is a great movie although I didn't like some of the scenes. I liked the actors though. Overall I really enjoyed the experience."
if prompt==1:
input = "Gladiator II is a great movie although I didn't like some of the scenes. I liked the actors though. Overall I really enjoyed the experience."
if prompt==2:
input = "Generative AI models such as GPT-4o can be built into Generative AI Systems. Provide more information."
if prompt==2:
input = "Generative AI models such as GPT-4o can be built into Generative AI Systems. Provide more information."
为了使人工智能控制器具备决策能力,我们需要一个指令场景库。
To provide the AI controller with decision-making capabilities, we will need a repository of instruction scenarios.
场景是存储在 GenAISys 存储库中的一系列指令。而类似 ChatGPT 的模型虽然GenAISys系统经过训练可以原生处理许多指令,但特定领域的用例需要定制场景(我们将从第5章开始深入探讨)。例如,GenAISys系统可能会收到类似这样的消息Customer order #9283444 is late。该消息可能与生产延迟或交付延迟有关。通过检查发件人的用户名和所属部门(生产部门或交付部门),AI控制器可以确定上下文,并选择相应的场景,做出相应的决策。
Scenarios are sets of instructions that live in a repository within a GenAISys. While ChatGPT-like models are trained to process many instructions natively, domain-specific use cases need custom scenarios (we’ll dive into these starting from Chapter 5). For example, a GenAISys could receive a message such as Customer order #9283444 is late. The message could be about a production delay or a delivery delay. By examining the sender’s username and group (production or delivery department), the AI controller can determine the context and, selecting a scenario, take an appropriate decision.
在本笔记本中,场景数据存储在内存中。在第 3 章中,我们将组织这些指令集在 Pinecone 向量存储中的存储和检索。
In this notebook, the scenarios are stored in memory. In Chapter 3, we will organize the storage and retrieval of these instruction sets in Pinecone vector stores.
在这两种情况下,我们都首先创建一个结构化场景库(市场、情绪和语义分析):
In both cases, we begin by creating a repository of structured scenarios (market, sentiment, and semantic analysis):
scenarios = [
{
"scenario_number": 1,
"description": "Market Semantic analysis.You will be provided with a market survey on a give range of products.The term market must be in the user or system input. Your task is provide an analysis."
},
{
"scenario_number": 2,
"description": " Sentiment analysis Read the content and classify the content as an opinion If it is not opinion, stop there If it is an opinion then your task is to perform a sentiment analysis on these statements and provide a score with the label: Analysis score: followed by a numerical value between 0 and 1 with no + or - sign.Add an explanation."
},
{
"scenario_number": 3,
"description": "Semantic analysis.This is not an analysis but a semantic search. Provide more information on the topic."
}
]
我们还将添加一个包含相同场景的词典,其中包含场景的简单定义:
We will also add a dictionary of the same scenarios, containing simple definitions of the scenarios:
# Original list of dictionaries
scenario_instructions = [
{
"Market Semantic analysis.You will be provided with a market survey on a give range of products.The term market must be in the user or system input. Your task is provide an analysis."
},
{
"Sentiment analysis Read the content return a sentiment analysis on this text and provide a score with the label named : Sentiment analysis score followed by a numerical value between 0 and 1 with no + or - sign and add an explanation to justify the score."
},
{
"Semantic analysis.This is not an analysis but a semantic search. Provide more information on the topic."
}
]
We now extract the strings from the dictionary and store them in a list:
# Extract the strings from each dictionary
instructions_as_strings = [
list(entry)[0] for entry in scenario_instructions
]
至此,我们的人工智能控制器已具备识别意图所需的一切——将任何传入的提示与最合适的场景进行匹配。
At this point, our AI controller has everything it needs to recognize intent—matching any incoming prompt to the best-fitting scenario.
我们首先定义参数就像我们在“对话式人工智能代理”部分所做的那样,对对话式人工智能代理进行操作:
We first define the parameters of the conversational AI agent just as we did in the Conversational AI agent section:
# Define the parameters for the function call
mrole = "system"
mcontent = "You are an assistant that matches user inputs to predefined scenarios. Select the scenario that best matches the input. Respond with the scenario_number only."
user_role = "user"
编排器的任务是为任何给定的输入找到最佳任务,从而使人工智能控制器具有灵活性和适应性。在某些情况下,编排器可能决定不应用某个场景,而只是遵循。用户的输入。但在以下示例中,协调器将选择一个场景并应用它。
The orchestrator’s job is to find the best task for any given input, making the AI controller flexible and adaptive. In some cases, the orchestrator may decide not to apply a scenario and just follow the user’s input. In the following example, however, the orchestrator will select a scenario and apply it.
现在我们调整输入,以考虑编排器的请求:
We now adjust the input to take the orchestrator’s request into account:
# Adjust `input` to combine user input with scenarios
selection_input = f"User input: {input}\nScenarios: {scenarios}"
print(selection_input)
GPT-4o 现在将执行文本语义相似度搜索,就像我们在语义文本相似度基准测试 (STSB)部分中运行的那样。在这种情况下,它不仅执行纯文本比较,而且还将一个文本(用户输入)与一个文本列表(我们的场景描述)进行匹配:
GPT-4o will now perform a text semantic similarity search as we ran in the Semantic Textual Similarity Benchmark (STSB) section. In this case, it doesn’t just perform a plain text comparison, but matches one text (the user input) against a list of texts (our scenario descriptions):
# Call the function using your standard API call
response = openai_api.make_openai_api_call(
selection_input, mrole, mcontent, user_role
)
用户输入如下:
Our user input is as follows:
User input: Gladiator II is a great movie
然后,选择场景:
Then, the scenario is chosen:
# Print the response
print("Scenario:",response )
然后选择场景编号,将其与相应的指令一起存储并显示:
The scenario number is then chosen, stored with the instructions that go with it, and displayed:
scenario_number=int(response)
instructions=scenario_instructions[scenario_number-1]
print(instructions)
以《角斗士2》为例,编排者正确地选择了情感分析场景:
For our Gladiator II example, the orchestrator correctly picks the sentiment analysis scenario:
{'Sentiment analysis Read the content return a sentiment analysis on this text and provide a score with the label named : Sentiment analysis score followed by a numerical value between 0 and 1 with no + or - sign and add an explanation to justify the score.'}
这自主任务选择这项功能——使 GenAISys 能够在没有明确标签的情况下选择合适的分析——将在实际部署中发挥至关重要的作用(参见第 5 章)。该程序现在使用生成式 AI 代理运行这些场景。
This autonomous task-selection capability—letting GenAISys choose the right analysis without explicit tags—will prove invaluable in real-world deployments (see Chapter 5). The program now runs the scenarios with the generative AI agent.
现在AI 控制器已识别出正确的任务scenario_number,现在可以执行所选任务了。在本笔记本中,我们将逐步介绍该过程。首先,我们打印输入:
Now that the AI controller has identified the correct scenario_number, it’s time to execute the selected task. In this notebook, we’ll walk through that process step by step. We first print the input:
print(input)
利用该scenario_number值,我们可以从列表中访问场景描述instructions_as_strings:
Using the scenario_number value, we access the scenario description from our instructions_as_strings list:
# Accessing by line number (1-based index)
line_number = scenario_number
instruction = instructions_as_strings[line_number - 1] # Adjusting for 0-based indexing
print(f"Instruction on line {line_number}:\n{instruction}")
mrole = "system"
user_role = "user"
mcontent = instruction
现在编排器已准备好运行情感分析。
The orchestrator is now ready to run a sentiment analysis.
我们附加对场景的描述原始用户提示并将组合请求发送给 GPT-4o:
We append the description of the scenario to the original user prompt and send the combined request to GPT-4o:
Instruction on line 2:
Sentiment analysis Read the content return a sentiment analysis nalysis on this text and provide a score with the label named : Sentiment analysis score followed by a numerical value between 0 and 1 with no + or - sign and add an explanation to justify the score.
# API function call
sc_input=instruction +" "+ input
print(sc_input)
task_response = openai_api.make_openai_api_call(
sc_input,mrole,mcontent,user_role
)
print(task_response)
以《角斗士2》为例,响应可能如下所示:
For our Gladiator II example, the response might look like this:
Sentiment analysis score 0.75
The text expresses a generally positive sentiment towards the movie "Gladiator II." The use of words like "great movie," "liked the actors," and "really enjoyed the experience" indicates a favorable opinion. However, the mention of not liking some of the scenes introduces a slight negative element. Despite this, the overall enjoyment and positive remarks about the actors and the movie as a whole outweigh the negative aspect, resulting in a sentiment score leaning towards the positive side.
回应这表明编曲者找到了一个与输入匹配且能产生可接受输出的场景。现在,我们返回并更改提示,看看编排器是否能找到正确的场景。
The response shows that the orchestrator found a scenario that matches the input and produces an acceptable output. Now, let’s go back, change the prompt, and see whether the orchestrator finds the right scenario.
目标现在要做的就是在不更改任何一行代码的情况下,验证编排器是否可以访问另一个场景。将依靠 GPT-4o 的原生能力来执行语义文本相似性搜索。
The goal now is to verify, without changing a single line of code, whether the orchestrator can access another scenario. The orchestrator will rely on GPT-4o’s native ability to perform semantic text similarity searches.
现在我们将激活提示符 2:
We will now activate prompt 2:
prompt=2
…
if prompt==2:
input = "Generative AI models such as GPT-4o can be built into Generative AI Systems. Provide more information."
显然,此输入需要进行语义分析而非情感分析。因此,我们直接复用与情感分析搜索完全相同的代码:
This input clearly calls for a semantic analysis rather than sentiment analysis. We then reuse the exact same code as our sentiment analysis search:
# Accessing by line number (1-based index)
line_number = scenario_number
instruction = instructions_as_strings[line_number - 1] # Adjusting for 0-based indexing
print(f"Instruction on line {line_number}:\n{instruction}")
mrole = "system"
user_role = "user"
mcontent = instruction
The output shows that the right scenario was found:
Instruction on line 3:
Semantic analysis.This is not an analysis but a semantic search. Provide more information on the topic.
The task response is displayed:
print(task_response)
输出结果显示,协调器生成了连贯的语义分析:
The output shows that the orchestrator produces a coherent semantic analysis:
Generative AI models, like GPT-4, are advanced machine learning models designed to generate human-like text based on the input they receive….
这表明,在某些情况下,编排器无需任务标签即可找到合适的场景。这在处理更复杂的工作流程(例如高级生产和支持)时将非常有用。
This demonstrates that in some cases, the orchestrator will be able to find the right scenarios without task tags. This will prove useful when we tackle more complex workflows, such as advanced production and support.
本章的首要要点是人类在全人类人工智能系统(GenAISys)中的核心作用。对话代理和编排器的创建都源于人类的设计。我们最初仅使用 OpenAI API 和 Python 开发这两个复杂的组件,但驱动我们定制版全人类人工智能系统的 AI 控制器的初始层级却是由我们人类设计的。全人类人工智能系统的基本原则始终适用:没有人类参与,就没有全人类人工智能系统。我们设计人工智能系统,实现它们,维护它们,并根据持续的反馈不断改进它们。
The first takeaway from this chapter is the central role of humans in a GenAISys. Human design drove the creation of both our conversational agent and orchestrator. We started developing these two complex components with simply an OpenAI API and Python, yet we humans designed the initial levels of the AI controller that powers our custom GenAISys. The basic GenAISys rule will always apply: no human roles, no GenAISys. We design AI systems, implement them, maintain them, and evolve them based on ongoing feedback.
第二个要点是,我们的对话式人工智能代理超越了类似 ChatGPT 的小规模结构。我们不仅构建了完整对话的短期上下文和记忆保持能力,还添加了跨多个用户和多个主题的长期记忆功能。我们的对话涉及三个用户(John、Myriam 和 Bob)和两个主题(地质学和冲浪)。随着本书的深入,我们将扩展这些多用户、多主题会话的范围,使其适用于团队协作至关重要的应用场景。
The second takeaway is how our conversational AI agent goes beyond a small-scale ChatGPT-like structure. We not only built short-term context and memory retention for a full-turn dialogue, but we also added long-term memory across multiple users and multiple topics. Our dialogue included three users (John, Myriam, and Bob) and two topics (geology and surfing). As we progress through the book, we will expand the scope of these multi-user, multi-topic sessions to use cases where team cooperation is essential.
第三个要点与我们的AI控制器编排器有关。我们给编排器提供了一个包含自定义指令的小型场景数据集,这些指令可以扩展到特定领域的用例,然后利用GPT-4o来选择合适的场景并执行任务本身。
The third takeaway concerns our AI controller orchestrator. We gave the orchestrator a small scenario dataset containing custom instructions that we can expand for a domain-specific use case, and then leveraged GPT-4o to both select the appropriate scenario and execute the task itself.
目前,我们已经拥有一个对话代理和一个初步的AI控制器编排器。当我们组装好AI控制器后,它们将共同构成一个独特的、多用户、多域的定制化GenAISys系统。为了构建这个多用户、多域的GenAISys AI控制器,我们将在下一章中构建一个Pinecone向量存储。
At this point, we have a conversational agent and a nascent AI controller orchestrator. When we assemble our AI controller, they will together form a unique multi-user, multi-domain customized GenAISys. To build our multi-user, multi-domain GenAISys AI controller, we will now build a Pinecone vector store in the next chapter.
新框架、演进的架构、研究成果发布、生产故障——AI_Distilled每周都会为从事 LLM 和 GenAI 系统实战的工程师和研究人员提供一份简报,帮您过滤掉各种干扰信息。立即订阅,即可免费获得电子书,以及每周更新的深度见解,助您保持专注,掌握最新资讯。
New frameworks, evolving architectures, research drops, production breakdowns—AI_Distilled filters the noise into a weekly briefing for engineers and researchers working hands-on with LLMs and GenAI systems. Subscribe now and receive a free eBook, along with weekly insights that help you stay focused and informed.
订阅请访问https://packt.link/TRO5B或扫描下方二维码。
Subscribe at https://packt.link/TRO5B or scan the QR code below.
一个可用于商业用途的生成式人工智能系统(GenAISys)需要具备灵活性,并能够应对人工智能市场快速发展的格局。人工智能控制器充当着电子营销、生产、存储、分发和支持等环节的自适应协调器,但为了满足如此广泛的任务需求,我们我们需要一个检索增强生成(RAG)框架。在前一章中,我们构建了一个对话式人工智能代理和一个用于生成式人工智能模型的指令场景相似性搜索函数(AI编排器)。在本章中,我们将增强这一基础,并在Pinecone索引中构建一个可扩展的RAG,以整合指令场景和经典数据,供生成式人工智能模型连接使用。
A business-ready generative AI system (GenAISys) needs to be flexible and ready to face the rapidly evolving landscape of the AI market. The AI controller acts as an adaptive orchestrator for e-marketing, production, storage, distribution, and support, but to satisfy such a range of tasks, we need a retrieval-augmented generation (RAG) framework. In the previous chapter, we built a conversational AI agent and a function for similarity search for instruction scenarios (AI orchestrator) for a generative AI model. In this chapter, we will enhance that foundation and build a scalable RAG in a Pinecone index to integrate both instruction scenarios and classical data, which the generative AI model will connect to.
本章中,我们对指导场景(专家精心设计的提示片段(或任务标签,如上一章所述))和经典数据(RAG 系统检索的参考资料,用于确定其答案)进行了明确的区分。
We make a clear distinction in this chapter between instruction scenarios—expert-crafted prompt fragments (or task tags, as explained in the previous chapter) that tell the model how to reason or act—and classical data—the reference material the RAG system retrieves to ground its answers.
为什么我们需要这种基于传统数据、动态自适应的 RAG 框架,并在此基础上构建向量化的指令场景?因为全球市场会对企业的内部和外部环境产生影响。例如,飓风可能导致电力短缺,危及企业的供应链。企业可能不得不迁移供应路线、生产或分销渠道。通用人工智能云平台或许能够完成部分工作,但更多时候,我们需要提供定制化的、特定领域的功能。因此,我们需要像 RAG 数据一样,在向量存储库中存储一套动态指令集。
Why do we need this dynamic and adaptive RAG framework with vectorized scenarios of instructions on top of classical data? Because the global market affects entities internally and externally. For example, a hurricane can cause electricity shortages, putting the supply chain of businesses in peril. Businesses might have to relocate supply routes, production, or distribution. General-purpose AI cloud platforms might do some of the job. But more often than not, we will need to provide custom, domain-specific functionality. For that reason, we need a dynamic set of instructions in a vector store repository as we do for RAG data.
我们将首先通过 Pinecone 索引定义生成式 AI 模型(在本例中为 GPT-4o)的架构场景驱动型任务执行。我们将仔细分析通过相似性搜索和检索为生成式模型构建智能场景的成本效益。我们将引入一个动态框架来生成类似 ChatGPT 的功能,并在后续章节中逐步介绍。
We will begin by defining the architecture scenario-driven task executions for a generative AI model, in this case, GPT-4o, through a Pinecone index. We will carefully go through the cost-benefits of investing in intelligent scenarios for the generative model through similarity search and retrieval. We will introduce a dynamic framework to produce ChatGPT-like capabilities that we will progressively introduce in the following chapters.
架构确定后,我们将首先构建一个 Pinecone 索引,用于对指令场景进行分块、嵌入和更新插入。我们将确保 GenAISys 向量存储能够嵌入查询并找到相关的指令场景。在第四章“构建 AI 控制器编排接口”中,我们将运用此功能来设计对话代理的界面和编排器。最后,我们将编写一个程序,将 RAG 环境中的经典数据更新插入到与指令场景相同的 Pinecone 索引中。我们将使用不同的命名空间来区分场景和经典数据。在本章结束时,我们将构建将指令链接到生成式 AI 模型的主要组件。我们将准备好在第四章中设计用户界面和 AI 控制器编排器。
Once the architecture is defined, we will first build a Pinecone index to chunk, embed, and upsert instruction scenarios. We will make sure the GenAISys vector store can embed a query and find a relevant instruction scenario. This capability will be a key component in Chapter 4, Building the AI Controller Orchestration Interface, when we design the conversational agent’s interface and orchestrator. Finally, we will write a program to upsert classical data in a RAG environment to the same Pinecone index alongside the instruction scenarios. Differentiation between scenarios and classical data will be maintained using distinct namespaces. By the end of this chapter, we will have built the main components to link instructions to a generative AI model. We will be ready to design a user interface and AI controller orchestrator in Chapter 4.
本章涵盖以下主题:
This chapter covers the following topics:
我们的首要任务是构建一个用于动态检索的 RAG 框架。
Our first task is to architect a RAG framework for dynamic retrieval.
在此本节将定义一个 Pinecone 索引,用于存储指令场景和经典数据。这种结构使 GenAISys 能够实现动态且经济高效的检索:指令场景指导生成式 AI 模型(在本例中为 GPT-4o),而经典数据则为 RAG 流水线提供事实背景。
In this section, we will define a Pinecone index that stores both instruction scenarios and classical data. This structure gives GenAISys dynamic, cost-effective retrieval: the instruction scenarios steer the generative AI model (GPT-4o in our example), while the classical data supplies the factual context used by the RAG pipeline.
We will go through the following components:
我们先来深入了解一下场景驱动的任务执行。
Let’s first dive deeper into scenario-driven task execution.
在上一章中,我们看到了人工智能控制器选择下一步操作的两种互补方式:
In the previous chapter, we saw two complementary ways the AI controller can pick what to do next:
本章延续了同样的模式,但规模更大。我们不再管理少量精心挑选的提示,而是管理存储在向量数据库中的数百甚至数千个由专家编写的教学场景;我们不再支持单用户实验,而是支持多个并发用户和工作流程。这种场景驱动的(隐式)方法具有以下三个优势:
That same pattern continues in this chapter, but at a larger scale. Instead of a few hand-picked prompts, we manage hundreds or even thousands of expert-authored instruction scenarios stored in a vector database; instead of single-user experiments, we support many concurrent users and workflows. This scenario-driven (implicit) approach has three advantages:
然而,虽然隐式任务规划能够最大限度地提高灵活性,但随着我们逐步构建可用于业务的系统,我们必须在灵活性和成本效益之间取得平衡。因此,在某些情况下,显式指令(例如通过选择用户界面中的选项来触发网络搜索)可以显著降低潜在成本,如图 3.1所示:
However, while implicit task planning maximizes flexibility, as we move toward building business-ready systems, we must balance flexibility with cost-efficiency. In some cases, therefore, explicit instructions, such as triggering a web search by selecting the option in the user interface, can significantly reduce the potential costs, as shown in Figure 3.1:
图 3.1:成本增加导致收益递减
Figure 3.1: Diminishing returns as costs increase
越多我们通过对 Pinecone 索引中的向量相似性进行搜索,来自动生成 AI 模型将要选择的隐式场景,而成本越高,这些场景就越容易被选择。为了控制成本,我们必须仔细考虑收益递减规律:
The more we automate implicit scenarios that the generative AI model will select with vector similarity searches in the Pinecone index, the higher the cost. To manage this, we must carefully consider the law of diminishing returns:
如图 3.1所示,该方程式(以理论单位表示)如下:
In this equation, as illustrated in Figure 3.1, in theoretical units, we have the following:
例如,当成本达到7个理论单位时,收益也达到7个理论单位。这1单位成本产生单位收益是合理的。然而,当收益达到10单位时,成本可能会翻倍至14单位,这表明情况可能不妙。
For example, when the cost reaches 7 theoretical units, the benefit reaches 7 theoretical units. This 1 unit of cost generating 1 unit of benefit is reasonable. However, when the benefit reaches 10 units, the cost could double to 14 units, which signals that something is going wrong.
The diminishing factor has a strong negative impact on the benefits through squared costs:
我们将密切关注这一因素。随着本书中用例的深入讲解,我们将需要在两种情况下做出选择:一是通过 Pinecone 索引运行隐式自动场景选择,二是直接在提示信息中通过预定义的指令显式触发操作。
We will carefully monitor the factor as we move through the use cases in this book. We will have to make choices between running implicit automated scenario selections through the Pinecone index and explicitly triggering actions through predefined instructions in the prompt itself.
现在让我们来探讨一下我们如何在 Pinecone 索引中识别教学场景。
Let’s now explore how we identify instruction scenarios within a Pinecone index.
我们的第一步是教会 GenAISys 框架清晰地区分经典数据和指令场景。为此,我们将在同一个 Pinecone 索引中使用两个命名空间来分离指令场景和数据,这两个命名空间分别命名为genai-v1:
Our first step is teaching the GenAISys framework to distinguish clearly between classical data and instruction scenarios. To achieve this, we will separate the instruction scenarios and the data with two namespaces within the same Pinecone index, named genai-v1:
genaisys将包含信息指令向量genaisys will contain instruction vectors of informationdata01将包含信息的数据向量我们将在本章的“创建松果索引”genai-v1部分中用代码实现,并附上更多解释。
data01 will contain data vectors of information We will implement genai-v1 in code with additional explanations in the Creating the Pinecone index section of this chapter.
一旦将 Pinecone 指数划分为场景和数据,我们就可以通过混合检索将 GenAISys 提升到另一个水平,如图3.2所示。
Once the Pinecone index has been partitioned into scenarios and data, we can take the GenAISys to another level with hybrid retrieval, as shown in Figure 3.2.
图 3.2:AI 控制器协调 GenAISys
Figure 3.2: AI controller orchestrating GenAISys
这混合检索框架前图中所示的启用 GenAISys 以执行以下操作:
The hybrid retrieval framework depicted in the preceding figure will enable GenAISys to do the following:
CoT环可以通过向量相似性搜索隐式运行,也可以通过直接指令触发器或任务标签(例如“运行网络搜索”)显式运行。例如,类似 ChatGPT 的辅助驾驶程序可以直接通过用户界面或 AI 控制器中的规则触发网络搜索。
CoT loops can operate implicitly via vector similarity search or explicitly via direct instruction triggers or task tags (such as “Run a web search”). For example, ChatGPT-like copilots can trigger web searches directly through the user interface or rules in the AI controller.
我们将在本章开始构建 GenAISys,并在接下来的几章中不断完善它。从第 4 章“构建 AI 控制器编排接口”开始,我们将使用这里介绍的 RAG 基础来开发如图 3.2所示的混合检索框架。我们构建的 GenAISys 将包含动态进程管理——这是跟上时代步伐的必要条件。市场环境不断变化。具体而言,我们的 GenAISys 将执行以下操作:
We’ll begin building our GenAISys in this chapter and continue refining it over the next few chapters. Starting from Chapter 4, Building the AI Controller Orchestration Interface, we’ll use the RAG foundations introduced here to develop the hybrid retrieval framework shown in Figure 3.2. The GenAISys we’re building will include dynamic process management—a requisite for keeping pace with the shifting market conditions. Specifically, our GenAISys will do the following:
我们最初的本章的步骤是构建genai-v1Pinecone 索引,AI 控制器将使用该索引来管理genaisys命名空间内的指令场景。接下来,我们将演示如何将经典数据分块、嵌入和更新插入到data01命名空间中。让我们开始吧!
Our initial step in this chapter is building the genai-v1 Pinecone index, which the AI controller will use to manage instruction scenarios within the genaisys namespace. Then, we’ll demonstrate how to chunk, embed, and upsert classical data into the data01 namespace. Let’s get moving!
我们将重点关注本文将介绍如何创建一个用于管理教学场景和经典数据的 Pinecone 索引。在接下来的章节中,我们将开始更新和插入教学场景以及经典数据。整个工作流程分为三个简单的阶段:
We’ll focus on creating a Pinecone index designed to manage both instruction scenarios and classical data. In the upcoming sections, we’ll begin upserting the instruction scenarios as well as classical data. The workflow breaks down into three straightforward stages:
打开Pinecone_instruction_scenarios.ipynbGitHub 上 Chapter03 目录下的 notebook 文件(https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main)。我们的首要任务是搭建环境。
Open the Pinecone_instruction_scenarios.ipynb notebook within the Chapter03 directory on GitHub (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main). Our first task is to set up the environment.
作为在本书中,我们将不断重用第 1 章和第2章中实现的功能和特性,为 Pinecone 添加新的功能和特性,并将安装内容组织成两部分:
As we move through the book, we will continually reuse functions and features implemented in Chapters 1 and 2, add new ones for Pinecone, and organize the installations into two parts:
首先,下载grequests.py我们需要从GitHub 仓库中获取以下文件:
To begin, download the files we need by retrieving grequests.py from the GitHub repository:
!curl -L https://raw.githubusercontent.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/master/commons/grequests.py --output grequests.py
要安装 OpenAI,请按照第 1 章中的步骤操作。接下来我们将安装 Pinecone,本书后续章节中会用到它。
To install OpenAI, follow the same steps as in Chapter 1. We’ll move on to install Pinecone now, which we will refer to in upcoming chapters throughout the book.
下载Pinecone 需求文件包含了本书中要使用的 Pinecone 版本的使用说明。如果需要其他版本,则只需更新此文件即可:
Download the Pinecone requirements file that contains the instructions for the Pinecone version we want to use throughout the book. If another version is required, this will be the only file that needs to be updated:
download("commons","requirements02.py")
小贴士:使用AI 代码解释器和快速复制功能,提升您的编码体验。在新一代 Packt 阅读器中打开本书。点击“复制”按钮。
Quick tip: Enhance your coding experience with the AI Code Explainer and Quick Copy features. Open this book in the next-gen Packt Reader. Click the Copy button
(1)快速将代码复制到您的编码环境中,或单击“解释”按钮
(1) to quickly copy code into your coding environment, or click the Explain button
(2)让 AI 助手向您解释一段代码。
(2) to get the AI assistant to explain a block of code to you.
购买本书即可免费获赠新一代 Packt Reader 阅读器。扫描二维码或访问packtpub.com/unlock,然后使用搜索栏按书名查找本书。请仔细核对显示的版本,确保您购买的是正确的版本。
The next-gen Packt Reader is included for free with the purchase of this book. Scan the QR code OR visit packtpub.com/unlock, then use the search bar to find this book by name. Double-check the edition shown to make sure you get the right one.
该文件包含安装函数,我们将使用以下命令调用该函数:
The file contains the installation function, which we will call with the following command:
# Run the setup script to install and import dependencies
%run requirements02
该脚本与第一章中描述的 OpenAI 脚本相同,但已针对 Pinecone 进行了适配。我们首先卸载 Pinecone,然后安装所需的版本:
The script is the same as the one for OpenAI described in Chapter 1, but adapted to Pinecone. We first uninstall Pinecone and then install the version we need:
import subprocess
import sys
def run_command(command):
try:
subprocess.check_call(command)
except subprocess.CalledProcessError as e:
print(f"Command failed: {' '.join(command)}\nError: {e}")
sys.exit(1)
# Uninstall the 'pinecone-client' package
print("Uninstalling 'pinecone-client'...")
run_command(
[sys.executable, "-m", "pip", "uninstall", "-y", "pinecone-client"]
)
# Install the specific version of 'pinecone-client'
print("Installing 'pinecone-client' version 5.0.1...")
run_command(
[
sys.executable, "-m", "pip", "install",\
"--force-reinstall", "pinecone-client==5.0.1"
]
)
Then, we verify the installation:
# Verify the installation
try:
import pinecone
print(
f"'pinecone-client' version {pinecone.__version__} is installed."
)
except ImportError:
print(
"Failed to import the 'pinecone-client' library after installation."
)
sys.exit(1)
输出结果显示客户端已成功安装:
The output shows we have successfully installed the client:
Uninstalling 'pinecone-client'...
Installing 'pinecone-client' version 5.0.1...
'pinecone-client' version 5.0.1 is installed.
接下来,我们来初始化 Pinecone API 密钥。
Let’s go ahead and initialize the Pinecone API key.
这程序现在开始下载pinecone_setup.py,我们将使用它来初始化 Pinecone API 密钥:
The program now downloads pinecone_setup.py, which we will use to initialize the Pinecone API key:
download("commons","pinecone_setup.py")
这种设置与我们在第 1 章中用于 OpenAI 的基于 Google Colab 密钥的方法类似,但在这里进行了调整,用于初始化 Pinecone API。
This setup mirrors the Google Colab secrets-based approach we used for OpenAI in Chapter 1, but it’s adapted here for initializing the Pinecone API.:
# Import libraries
import openai
import os
from google.colab import userdata
# Function to initialize the Pinecone API key
def initialize_pinecone_api():
# Access the secret by its name
PINECONE_API_KEY = userdata.get('PINECONE_API_KEY')
if not PINECONE_API_KEY:
raise ValueError("PINECONE_API_KEY is not set in userdata!")
# Set the API key in the environment and OpenAI
os.environ['PINECONE_API_KEY'] = PINECONE_API_KEY
print("PINECONE_API_KEY initialized successfully.")
如果如果在本笔记本的 OpenAI 部分中将 Google Secrets 设置True为 OpenAI,则会调用 Pinecone 设置函数:
If Google secrets was set to True for OpenAI in the OpenAI section of this notebook, then the Pinecone setup function will be called:
if google_secrets==True:
import pinecone_setup
pinecone_setup.initialize_pinecone_api()
如果 Google Secrets 设置为 true False,则可以通过取消注释代码并使用您希望的任何方法输入 Pinecone API 密钥来实现自定义函数:
If Google secrets was set to False, then you can implement a custom function by uncommenting the code and entering the Pinecone API key with any method you wish:
if google_secrets==False: # Uncomment the code and choose any method you wish to initialize the Pinecone API key
import os
#PINECONE_API_KEY=[YOUR PINECONE_API_KEY]
#os.environ['PINECONE_API_KEY'] = PINECONE_API_KEY
#openai.api_key = os.getenv("PINECONE_API_KEY")
#print("Pinecone API key initialized successfully.")
程序现在已准备好处理我们将要插入到 Pinecone 索引中的数据。
The program is now ready to process the data we will upsert to the Pinecone index.
我们的目标现在需要准备存储和检索场景,以便我们能够查询 Pinecone 索引。该过程的主要步骤如图 3.2所示,这只是后续章节路线图中的一层。我们将按以下步骤处理数据:
Our goal now is to prepare the scenarios for storage and retrieval so that we can then query the Pinecone index. The main steps of the process are represented in Figure 3.2, which is only one layer of the roadmap for the following chapters. We will process the data in the following steps:
现在我们来介绍前两个步骤:加载和准备数据,然后进行分块。
Let’s now cover the first two steps: loading and preparing the data, followed by chunking.
我们将使用第二章中实现的各种场景都存储在一个文件中,我们现在就来下载这个文件:
We will use the scenarios implemented in Chapter 2. They are stored in a file that we will now download:
download("Chapter03","scenario.csv")
在本书中,我们将逐步添加更多场景,最终构建一个 GenAISys。目前,我们的主要目标是让 Pinecone 索引正常工作。程序首先初始化start_time以进行时间测量。然后,我们将场景指令逐行加载到chunks程序中:
We will add more scenarios throughout our journey in this book to create a GenAISys. For the moment, our main objective is to get our Pinecone index to work. The program first initializes start_time for time measurement. Then we load the lines of scenario instructions directly into chunks line by line:
import time
start_time = time.time() # Start timing
# File path
file_path = 'scenario.csv'
# Read the file, skip the header, and clean the lines
chunks = []
with open(file_path, 'r') as file:
next(file) # Skip the header line
chunks = [line.strip() for line in file] # Read and clean lines as chunks
然后代码会显示数据块的数量以及创建这些数据块所花费的时间:
Then the code displays the number of chunks and the time it took to create the chunks:
response_time = time.time() - start_time # Measure response time
print(f"Response Time: {response_time:.2f} seconds") # Print response time
Total number of chunks: 3
Response Time: 0.00 seconds
The program now verifies the first three chunks of scenario instructions:
# Optionally, print the first three chunks for verification
for i, chunk in enumerate(chunks[:3], start=1):
print(chunk)
输出结果显示了本章我们将要讨论的三种场景:
The output shows the three scenarios we will be working on in this chapter:
['ID,SCENARIO\n',
'100,Semantic analysis.This is not an analysis but a semantic search. Provide more information on the topic.\n',
'200,Sentiment analysis Read the content return a sentiment analysis nalysis on this text and provide a score with the label named : Sentiment analysis score followed by a numerical value between 0 and 1 with no + or - sign and add an explanation to justify the score.\n',
'300,Semantic analysis.This is not an analysis but a semantic search. Provide more information on the topic.\n']
数据块已准备好进行嵌入。让我们开始嵌入过程。
The chunks of data are now ready for embedding. Let’s proceed with embedding.
嵌入对于数据集,我们首先初始化嵌入模型,然后对数据块进行嵌入。程序首先初始化嵌入模型。
To embed the dataset, we will first initialize the embedding model and then embed the chunks. The program first initializes the embedding model.
我们我们将使用 OpenAI 嵌入模型来嵌入数据。要使用 OpenAI 模型嵌入数据,我们可以选择以下三种主要模型之一:
We will be using an OpenAI embedding model to embed the data. To embed our data with an OpenAI model, we can choose one of three main models:
text-embedding-3-small它速度快,资源占用低,足以满足实时应用的需求。它体积较小,因此成本效益高。然而,随着场景复杂度的增加,向量存储的大小也会增加,因此对于精细化的任务,其精度可能会降低。text-embedding-3-small, which is fast and has a lower resource usage. This is sufficient for real-time usage. It is a smaller model and is thus cost-effective. However, as the vector store will increase in size with complex scenarios, it might be less accurate for nuanced tasks.text-embedding-3-large该方法能够提供高精度和细致入微的词嵌入,对于复杂的语义相似性搜索非常有效。但它需要更多资源,成本也更高。text-embedding-3-large, which provides high accuracy and nuanced embeddings and will prove effective for complex semantic similarity searches. It requires more resources and costs more.text-embedding-ada-002对于高质量的嵌入来说,它是一种经济高效的选择。但是,它的速度比诸如text-embedding-3-small和之类的模型稍慢text-embedding-3-large。text-embedding-ada-002, which is cost-effective for good-quality embeddings. However, it’s slightly slower than models such as text-embedding-3-small and text-embedding-3-large.您可以访问https://platform.openai.com/docs/guides/embeddings查看 OpenAI 文档,了解更多信息。
You can consult the OpenAI documentation at https://platform.openai.com/docs/guides/embeddings for more info.
为了在本章中导入有限数量的场景,我们将使用text-embedding-3-small优化速度和成本的方法。程序会初始化模型,而其他部分则以注释的形式提供,以便日后需要时使用:
To import a limited number of scenarios in this chapter, we will use text-embedding-3-small to optimize speed and cost. The program initializes the model while the others are commented for further use if needed:
import openai
import time
embedding_model="text-embedding-3-small"
#embedding_model="text-embedding-3-large"
#embedding_model="text-embedding-ada-002"
我们初始化 OpenAI 客户端:
We initialize the OpenAI client:
# Initialize the OpenAI client
client = openai.OpenAI()
然后创建一个嵌入函数,将发送给它的文本转换为嵌入向量。该函数旨在为一批输入文本(texts)生成嵌入,使用的嵌入模型是我们选择的,在本例中为text-embedding-3-small:
An embedding function is then created that will convert the text sent to it into embeddings. The function is designed to produce embeddings for a batch of input texts (texts) with the embeddings model of our choice, in this case, text-embedding-3-small:
def get_embedding(texts, model="text-embedding-3-small")
该函数首先清理文本,将每段文本中的换行符替换为空格:
The function first cleans the text by replacing newline characters in each text with spaces:
texts = [text.replace("\n", " ") for text in texts]
然后,该函数会发起 API 嵌入调用:
Then, the function makes the API embedding call:
response = client.embeddings.create(input=texts, model=model)
词嵌入是从响应中提取出来的:
The embeddings are extracted from the response:
embeddings = [res.embedding for res in response.data] # Extract embeddings
最后,返回嵌入向量:
Finally, the embeddings are returned:
return embeddings
程序现在可以嵌入数据块了。
The program is now ready to embed the chunks.
The program first defines a function to embed the chunks:
def embed_chunks(
chunks, embedding_model="text-embedding-3-small",
batch_size=1000, pause_time=3
):
该函数的参数如下:
The parameters of the function are:
chunks要嵌入的文本部分chunks: The parts of text to embedembedding_model:定义要使用的模型,例如:text-embedding-3-smallembedding_model: Defines the model to use, such as text-embedding-3-smallbatch_size函数在单个批次中可以处理的块数,例如batch_size=1000batch_size: The number of chunks the function can process in a single batch, such as batch_size=1000pause_time暂停时间(以秒为单位),可用于速率限制。pause_time: A pause time in seconds, which can be useful for rate limits然后我们初始化计时函数、embeddings变量和计数器:
We then initialize the timing function, embeddings variable, and counter:
start_time = time.time() # Start timing the operation
embeddings = [] # Initialize an empty list to store the embeddings
counter = 1 # Batch counter
The code is now ready to process the chunks in batches:
# Process chunks in batches
for i in range(0, len(chunks), batch_size):
chunk_batch = chunks[i:i + batch_size] # Select a batch of chunks
然后,将每个批次发送到嵌入函数:
Each batch is then sent to the embedding function:
# Get the embeddings for the current batch
current_embeddings = get_embedding(
chunk_batch, model=embedding_model
)
嵌入的批次被添加到嵌入列表中:
The embedded batch is appended to the embeddings list:
# Append the embeddings to the final list
embeddings.extend(current_embeddings)
监控并显示批次数量,并激活暂停功能:
The number of batches is monitored and displayed and the pause is activated:
# Print batch progress and pause
print(f"Batch {counter} embedded.")
counter += 1
time.sleep(pause_time) # Optional: adjust or remove this depending on rate limits
所有批次处理完毕后,将显示总耗时:
Once all the batches are processed, the total time is displayed:
# Print total response time
response_time = time.time() - start_time
print(f"Total Response Time: {response_time:.2f} seconds")
嵌入函数已准备就绪,可以使用数据块列表调用它:
The embedding function is ready to be called with the chunks list:
embeddings = embed_chunks(chunks)
输出结果显示场景数据已嵌入:
The output shows that the scenario data has been embedded:
Batch 1 embedded.
Total Response Time: 4.09 seconds
显示第一个嵌入以供验证:
The first embedding is displayed for verification:
print("First embedding:", embeddings[0])
The output confirms that the embeddings have been generated:
First embedding: [0.017762450501322746, 0.041617266833782196, -0.024105189368128777,…
最后一步验证是检查嵌入的数量是否与数据块的数量相匹配:
The final verification is to check that the number of embeddings matches the number of chunks:
# Check the lengths of the chunks and embeddings
num_chunks = len(chunks)
print(f"Number of chunks: {num_chunks}")
print(f"Number of embeddings: {len(embeddings)}")
输出结果证实,分块和嵌入过程很可能已经成功:
The output confirms that the chunking and embedding process is most probably successful:
Number of chunks: 3
Number of embeddings: 3
现在数据块和嵌入已准备好插入到 Pinecone 索引中。
The chunks and embeddings are now ready to be upserted into the Pinecone index.
松果genai-v1指数我们将创建包含两个命名空间的目录,如图 3.3所示:
The genai-v1 Pinecone index we will create will contain two namespaces, as shown in Figure 3.3:
genaisys:一个指令场景库。这些提示可以驱动生成式人工智能行为,也可以触发传统功能,例如网络搜索。genaisys: A repository of instruction scenarios. These prompts drive generative AI behavior and can also trigger traditional functions such as web search.Data01:RAG 管道查询的嵌入式经典数据。Data01: The embedded classical data that the RAG pipeline queries.图 3.3:将 Pinecone 索引划分为命名空间
Figure 3.3: Partitioning the Pinecone index into namespaces
我们首先导入两个类:
We begin by importing two classes:
from pinecone import Pinecone, ServerlessSpec
班级Pinecone这是与 Pinecone 索引交互的主要接口。我们将使用此类来配置 Pinecone 的无服务器服务。
The Pinecone class is the primary interface to interact with the Pinecone index. We will use this class to configure Pinecone’s serverless services.
在继续之前,您需要注册一个 Pinecone 账户并获取 API 密钥。请务必访问https://www.pinecone.io/查看这些服务的费用。本章内容完整,您可以先阅读正文、注释和代码,然后再决定是否创建 Pinecone 账户。
Before going further, you will need to set up a Pinecone account and obtain an API key. Make sure to verify the cost of these services at https://www.pinecone.io/. This chapter is self-contained, so you can begin by reading the content, comments, and code before deciding on creating a Pinecone account.
账户设置完成后,我们需要获取并初始化我们的 API 密钥:
Once our account is set up, we need to retrieve and initialize our API key:
# Retrieve the API key from environment variables
api_key = os.environ.get('PINECONE_API_KEY')
if not api_key:
raise ValueError("PINECONE_API_KEY is not set in the environment!")
# Initialize the Pinecone client
pc = Pinecone(api_key=api_key)
现在我们导入规范类,定义索引名称( ),并为我们的场景genai-v1初始化第一个命名空间( ):genaisys
We now import the specification class, define the name of our index (genai-v1), and initialize our first namespace (genaisys) for our scenarios:
from pinecone import ServerlessSpec
index_name = "genai-v1"
namespace="genaisys"
我们现在需要做出一个项目管理决策——使用 Pinecone 云托管我们的索引,还是使用亚马逊网络服务( AWS )?
We now have a project management decision to make—use the Pinecone cloud to host our index or Amazon Web Services (AWS)?
cloud = os.environ.get('PINECONE_CLOUD') or 'aws'
region = os.environ.get('PINECONE_REGION') or 'us-east-1'
spec = ServerlessSpec(cloud=cloud, region=region)
代码首先检查是否PINECONE_CLOUD已设置环境变量()以使用 Pinecone 云。如果没有预定义的环境变量检查,则该变量默认使用 AWS,'aws'并将'us-east-1'设置为默认区域。
The code first checks whether an environment variable (PINECONE_CLOUD) is set to use the Pinecone cloud. If there is no predefined environment variable check set, the variable defaults to AWS with 'aws' and 'us-east-1' as the default region.
有关更多信息,请参阅 Pinecone Python SDK 文档,网址为https://docs.pinecone.io/reference/python-sdk。
For more information, refer to the Pinecone Python SDK documentation at https://docs.pinecone.io/reference/python-sdk.
在这种情况下,选择 AWS 的原因如下:
In this case, AWS was chosen for the following reasons:
或者,你您可以在 Pinecone 控制台中手动创建索引,以选择嵌入模型和主机,例如 AWS 或Google Cloud Platform ( GCP )。您还可以选择 pod 大小,从 x1 到更大值,这将决定索引的最大大小。每个选择都取决于您的项目和资源优化策略。
Alternatively, you can create an index manually in your Pinecone console to select the embedding model and the host, such as AWS or Google Cloud Platform (GCP). You can also select your pod size from x1 to more, which will determine the maximum size of your index. Each choice depends on your project and resource optimization strategy.
总之,我们需要一些指标来监控使用情况和成本。Pinecone 提供详细的使用情况指标,您可以通过帐户访问这些指标,从而高效地管理索引。例如,我们可能需要删除您不再需要的信息、添加目标数据或优化每个用户的使用情况。
In any case, we need metrics to monitor usage and cost. Pinecone provides detailed usage metrics accessible via your account, allowing you to manage indexes efficiently. For example, we might want to delete information you don’t need anymore, add targeted data, or optimize the usage per user.
Pinecone 提供三个关键指标:
Pinecone provides three key metrics:
您可以登录 Pinecone 账户,选择“使用情况”,然后点击“下载”按钮,即可下载详细的消费信息,如下图所示:
You can download detailed information on your consumption by going to your Pinecone account, selecting Usage, and then clicking on the Download button, as shown here:
图 3.4:下载松果使用数据
Figure 3.4: Downloading Pinecone usage data
下载的文件为 CSV 格式,其中包含我们 Pinecone 使用情况的详细记录,例如BillingAccountId (帐户标识符)、BillingAccountName (帐户名称)、OrganizationName(组织名称)、 OrganizationId(组织 ID)、ProjectId (项目标识符)、ProjectName (项目名称)、 ResourceId(资源标识符)、ResourceName (资源名称)、ChargePeriodStart (收费开始日期)、ChargePeriodEnd(收费结束日期)、 BillingPeriodStart (账单开始日期)、BillingPeriodEnd (账单结束日期)、SkuId(SKU 标识符)、SkuPriceId(SKU 价格 ID)、ServiceName(服务名称)、ChargeDescription (收费详情)、CloudId (云提供商)、RegionId (区域)、Currency(货币类型)、PricingQuantity(使用量)、PricingUnit(使用单位) 、(ListCost列出成本)、EffectiveCost(计算成本)、BilledCost(最终成本)和Metadata(附加数据)。
The download file is in CSV format and contains a detailed account of our Pinecone usage, such as BillingAccountId (account identifier), BillingAccountName (account name), OrganizationName (organization name), OrganizationId (organization ID), ProjectId (project identifier), ProjectName (project name), ResourceId (resource identifier), ResourceName (resource name), ChargePeriodStart (charge start date), ChargePeriodEnd (charge end date), BillingPeriodStart (billing start date), BillingPeriodEnd (billing end date), SkuId (SKU identifier), SkuPriceId (SKU price ID), ServiceName (service name), ChargeDescription (charge details), CloudId (cloud provider), RegionId (region), Currency (currency type), PricingQuantity (usage quantity), PricingUnit (usage unit), ListCost (listed cost), EffectiveCost (calculated cost), BilledCost (final cost), and Metadata (additional data).
随着人工智能逐渐进入工业时代,逐渐远离 2020 年代初期的最初兴奋,对这些指标的持续监控变得越来越重要。
As AI slowly enters its industrial age, straying away from the initial excitement of the early 2020s, continuous monitoring of these metrics becomes increasingly critical.
我们现在将检查我们选择的索引是否存在。程序导入了 ` pineconeand`time类,以便在检查索引是否存在之前插入一个睡眠时间:
We will now check whether the index we selected exists or not. The program imports the pinecone and time classes to insert a sleep time before checking whether the index exists:
import time
import pinecone
# check if index already exists (it shouldn't if this is first time)
if index_name not in pc.list_indexes().names():
如果索引已存在,则跳过以下代码以避免创建重复索引。否则,将创建一个索引:
If the index exists, the following code will be skipped to avoid creating duplicate indexes. If not, an index is created:
# if does not exist, create index
pc.create_index(
index_name,
dimension=1536, # dimension of the embedding model
metric='cosine',
spec=spec
)
# wait for index to be initialized
time.sleep(1)
参数如下:
The parameters are as follows:
index_name这就是我们松果指数的名称,genai-v1index_name, which is the name of our Pinecone index, genai-v1dimension=1536嵌入向量的维度dimension=1536, the dimensionality of the embedding vectorsmetric='cosine'它将相似性搜索的距离度量设置为余弦相似度。metric='cosine', which sets the distance metric for similarity searches to cosine similarityspec=spec它定义了区域以及我们之前为云服务定义的无服务器规范spec=spec, which defines the region and the serverless specification we defined previously for the cloud servicestime.sleep(1)这样一来,程序就会等待,以确保索引完全创建后再继续执行。time.sleep(1), which makes the program wait to make sure the index is fully created before continuing如果索引刚刚创建,输出将显示其详细信息,并将total_vector_count设置为0(如果您看到除 0 以外的数字0,则笔记本可能已经运行过):
If the index has just been created, the output shows its details with total_vector_count set to 0 (if you see a number other than 0, the notebook has likely already been run):
{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {},
'total_vector_count': 0}
如果索引如果已存在,则会显示统计信息,包括index_fullness监控索引 pod 中使用的空间(从 0 到 1):
If the index already exists, the statistics will be displayed, including index_fullness to monitor the space used in your index pod from 0 to 1:
Index stats
{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {'genaisys': {'vector_count': 3}},
'total_vector_count': 3}
在这种情况下,我们还没有填充索引。我们可以连接到刚刚创建的索引,并在填充之前显示其统计信息:
In this case, we haven’t populated the index yet. We can connect to the index we just created and display its statistics before populating it:
# connect to index
index = pc.Index(index_name)
# view index stats
index.describe_index_stats()
输出结果显示相关信息,确认连接已建立:
The output displays the information, confirming that we are connected:
{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {'genaisys': {'vector_count': 0}},
'total_vector_count': 0}
所选嵌入模型必须与 Pinecone 索引的维度相匹配( )。我们将在第 5 章1536开始处理用例时,以交互方式创建 Pinecone 索引的参数。这里,我们使用具有 1536 个维度的模型,这与 Pinecone 索引的维度相匹配。embedding_model="text-embedding-3-small
The selected embedding model must match Pinecone’s index dimension (1536). We will create the parameters of a Pinecone index interactively when we begin working on use cases in Chapter 5. Here, we are using embedding_model="text-embedding-3-small with its 1,536 dimensions, which matches the dimension of the Pinecone index.
另请注意,'genaisys'我们初始化的命名空间也会被考虑在内。这确保了当我们按照我们设计的场景进行更新插入操作后,它们不会与同一索引中另一个命名空间中的经典数据混淆。现在,我们已准备好将数据更新插入到我们的 Pinecone 索引中。
Note also that the 'genaisys' namespace we initialized is taken into account. This ensures that when we upsert the scenarios we designed, they will not be confused with the classical data that is in another namespace of the same index. We are now ready to upsert the data to our Pinecone index.
插入嵌入式如本节开头所述,对 Pinecone 索引进行数据块更新会带来一定的成本。我们必须谨慎决定要更新哪些数据。如果我们更新所有数据,则可能会出现以下情况:
Upserting embedded chunks into a Pinecone index comes with a cost, as explained at the beginning of this section. We must carefully decide which data to upsert. If we upsert all the data, we might do the following:
如果我们选择不进行数据插入操作,我们有两种选择:
If we choose not to upsert the data, we have two options:
我们首先初始化库并启动计时器来测量脚本运行所需的时间:
We first initialize the libraries and start a timer to measure how long it takes to run the script:
import pinecone
import time
import sys
start_time = time.time() # Start timing before the request
程序随后必须计算发送给 Pinecone 的最大批次大小。为了安全起见,该值设置为 400,000 字节,即 4 MB。如果达到限制,则返回批次大小:
The program must then calculate the maximum size of the batch we send to Pinecone. It is set to 400,000 bytes, or 4 MB, to play it safe. If the limit is reached, the batch size is returned:
# Function to calculate the size of a batch
def get_batch_size(data, limit=4000000): # limit set to 4MB to be safe
total_size = 0
batch_size = 0
for item in data:
item_size = sum([sys.getsizeof(v) for v in item.values()])
if total_size + item_size > limit:
break
total_size += item_size
batch_size += 1
return batch_size
我们现在需要一个upsert在调用时能考虑批处理大小的函数:
We now need an upsert function that takes the batch size into account when called:
# Upsert function with namespace
def upsert_to_pinecone(batch, batch_size, namespace="genaisys"):
"""
Upserts a batch of data to Pinecone under a specified namespace.
"""
try:
index.upsert(vectors=batch, namespace=namespace)
print(
f"Upserted {batch_size} vectors to namespace '{namespace}'."
)
except Exception as e:
print(f"Error during upsert: {e}")
在生产环境中,我们通常会在出错时退出,但对于这本教学笔记本来说,打印可以帮助我们在不停止执行的情况下进行观察。
In production, we would typically exit on error, but for this educational notebook, printing helps us observe without stopping execution.
请注意,我们将把指令场景更新插入到genaisysPinecone 索引下的命名空间中。现在我们可以定义主要的批量更新插入函数:
Note that we will upsert the instruction scenarios into the namespace, genaisys, within the Pinecone index. We can now define the main batch upsert function:
def batch_upsert(data):
该函数首先确定数据的总长度,然后准备与它将要计算的批次大小相匹配的批次get_batch_size。接着,它创建一个批次并将其发送到upsert_to_pinecone我们定义的函数:
The function begins by determining the total length of the data and then prepares batches that match the batch size that it will calculate with the get_batch_size function. Then, it creates a batch and sends it to the upsert_to_pinecone function we defined:
# Function to upsert data in batches
def batch_upsert(data):
total = len(data)
i = 0
while i < total:
batch_size = get_batch_size(data[i:])
batch = data[i:i + batch_size]
if batch:
upsert_to_pinecone(batch, batch_size, namespace="genaisys")
i += batch_size
print(f"Upserted {i}/{total} items...") # Display current progress
else:
break
print("Upsert complete.")
当upsert 操作完成后,输出将显示成功消息,表明我们已准备好进行 upsert 操作。Pinecone 索引需要一个 ID,我们现在将创建该 ID:
When the upsert is completed, the output will display a success message, signaling that we are ready to prepare the upsert process. A Pinecone index requires an ID that we will now create:
# Generate IDs for each data item
ids = [str(i) for i in range(1, len(chunks) + 1)]
每个嵌入的数据块都有了 ID 之后,我们需要将数据格式化以适应 Pinecone 的索引结构:
Once each embedded chunk has an ID, we need to format the data to fit Pinecone’s index structure:
# Prepare data for upsert
data_for_upsert = [
{"id": str(id), "values": emb, "metadata": {"text": chunk}}
for id, (chunk, emb) in zip(ids, zip(chunks, embeddings))
]
现在数据已格式化,包含 ID、值(嵌入)和元数据(数据块)。接下来,我们调用batch_upsert将调用我们创建的相关函数的函数:
The data is now formatted with an ID, values (embeddings), and metadata (the chunks). Let’s call the batch_upsert function that will call the related functions we created:
# Upsert data in batches
batch_upsert(data_for_upsert)
upsert 操作完成后,将显示已插入到命名空间的向量数量以及所用时间:
When the upserting process is finished, the number of vectors upserted to the namespace and the time it took are displayed:
Upserted 3 vectors to namespace 'genaisys'.
Upserted 3/3 items...
Upsert complete.
Upsertion response time: 0.45 seconds
我们还可以展示松果指数的统计数据:
We can also display the statistics of the Pinecone index:
#You might have to run this cell after a few seconds to give Pinecone
#the time to update the index information
print("Index stats")
print(index.describe_index_stats(include_metadata=True))
请注意,您可能需要等待几秒钟,以便 Pinecone 有时间更新索引信息。
Note that you might have to wait a few seconds to give Pinecone time to update the index information.
输出结果显示以下信息:
The output displays the information:
Index stats
{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {'genaisys': {'vector_count': 3}},
'total_vector_count': 3}
显示的信息如下:
The information displayed is as follows:
'dimension': 1536嵌入的维度。'dimension': 1536: Dimension of the embeddings.'index_fullness': 0.0:一个介于 0 和 1 之间的值,表示 Pinecone 索引的填充程度。我们必须监控此值,以优化我们正在更新插入的数据,避免出现问题。我们需要增加所使用的存储容量。更多信息,请参阅 Pinecone 文档:https://docs.pinecone.io/guides/get-started/overview。'index_fullness': 0.0: A value between 0 and 1 that shows how full the Pinecone index is. We must monitor this value to optimize the data we are upserting to avoid having to increase the size of the storage capacity we are using. For more information, consult the Pinecone documentation at https://docs.pinecone.io/guides/get-started/overview.'namespaces': {'genaisys': {'vector_count': 3}}显示命名空间和向量计数。'namespaces': {'genaisys': {'vector_count': 3}}: Displays the namespace and vector count.'total_vector_count': 3}:显示松果索引中的向量总数。'total_vector_count': 3}: Displays the total vector count in the Pinecone index.现在我们准备将经典数据上传到其命名空间。
We are now ready to upload the classical data into its namespace.
建筑GenAISys 涉及多个团队。为了让每个团队能够并行工作以优化生产时间,我们将把经典数据插入到单独的程序/笔记本中。一个团队可以负责编写指令场景,而另一个团队则可以负责收集和处理数据。
Building a GenAISys involves teams. So that each team can work in parallel to optimize production times, we will upsert the classical data in a separate program/notebook. One team can work on instruction scenarios while another team works on gathering and processing data.
打开笔记本Pinecone_RAG.ipynb。我们将重用本章“构建动态 Pinecone 索引”Pinecone_instruction_scenarios.ipynb部分中构建的笔记本的几个组件。环境设置与之前的笔记本相同。Pinecone 索引也相同。源数据 upsert 的命名空间是 `<namespace>` ,正如我们在前面的章节中已经确定的那样,以确保数据与指令场景分离。因此,唯一的真正区别在于我们加载的数据和分块方法。让我们开始吧!genai-v1data01
Open Pinecone_RAG.ipynb. We will be reusing several components of the Pinecone_instruction_scenarios.ipynb notebook built in the Building a dynamic Pinecone index section of this chapter. Setting up the environment is identical to the previous notebook. The Pinecone index is the same, genai-v1. The namespace for source-data upserting is data01, as we’ve already established in earlier sections, to make sure the data is separated from the instruction scenarios. So, the only real difference is the data we load and the chunking method. Let’s get into it!
这该部分使用与指令场景相同的流程嵌入数据块Pinecone_instruction_scenarios.ipynb。然而,这一次,GPT-4o 并没有这样做。分块处理。在导入指令场景时,我们希望保持场景的完整性,以便能够为生成式人工智能模型提供完整的指令集。在这种情况下,我们将利用生成式人工智能的强大功能,使用 GPT-4o 对原始文本进行分块处理。
This section embeds chunks using the same process as for instruction scenarios in Pinecone_instruction_scenarios.ipynb. However, this time, GPT-4o does the chunking. When importing lines of instruction scenarios, we wanted to keep the integrity of the scenario in one chunk to be able to provide a complete set of instructions to the generative AI model. In this case, we will leverage the power of generative AI and chunk raw text with GPT-4o.
我们首先下载数据(而不是场景),并设置文件路径:
We begin by downloading data, not scenarios, and setting the path of the file:
download("Chapter03","data01.txt")
# Load the CSV file
file_path = '/content/data01.txt'
现在,文本文件作为一个大块被加载到一个变量中并显示出来:
Now, the text file is loaded as one big chunk in a variable and displayed:
try:
with open(file_path, 'r') as file:
text = file.read()
text
except FileNotFoundError:
text = "Error: File not found. Please check the file path."
print(text)
虽然生产应用程序通常会在出现严重错误时退出FileNotFoundError,但对于这本教学笔记本来说,打印错误信息可以让我们在不中断学习流程的情况下观察结果。
While a production application would typically exit on a critical FileNotFoundError, for this educational notebook, printing the error allows us to observe the outcome without interrupting the learning flow.
您可以添加注释print(text)或仅打印几行内容。在本例中,我们来验证是否已正确导入文件。输出结果显示已成功导入:
You can comment print(text) or only print a few lines. In this case, let’s verify that we have correctly imported the file. The output shows that we did:
The CTO was explaing that a business-ready generative AI system (GenAISys) offers functionality similar to ChatGPT-like platforms…
这文本包含来自我们正在上传数据到我们自定义 RAG 数据库的公司的首席技术官 (CTO) 的消息。可能存在数千条此类内部消息——数量太多(而且变化太快),不足以支撑模型微调。Pinecone 只存储关键信息块,既能提供可搜索的上下文,又不会让索引充斥噪声。
The text contains a message from the CTO of the company whose data we are uploading to our custom RAG database. A company might have thousands of such internal messages—far too many (and too volatile) to justify model fine-tuning. Storing only the key chunks in Pinecone gives us searchable context without flooding the index with noise.
该text变量尚未准备好被 GPT-4o 进行分块处理。第一步是创建一个 OpenAI 实例,并向 GPT-4o 模型提供指令:
The text variable is not ready yet to be chunked by GPT-4o. The first step is to create an OpenAI instance and give the GPT-4o model instructions:
# Import libraries
from openai import OpenAI
# Initialize OpenAI Client
client = OpenAI()
# Function to chunk text using GPT-4o
def chunk_text_with_gpt4o(text):
# Prepare the messages for GPT-4o
messages = [
{"role": "system", "content": "You are an assistant skilled at splitting long texts into meaningful, semantically coherent chunks of 50-100 words each."},
{"role": "user", "content": f"Split the following text into meaningful chunks:\n\n{text}"}
]
Now we send the request to the API:
# Make the GPT-4o API call
response = client.chat.completions.create(
model="gpt-4o", # GPT-4o model
messages=messages,
temperature=0.2, # Low randomness for consistent chunks
max_tokens=1024 # Sufficient tokens for the chunked response
)
我们需要密切关注max_tokens=1024设置:GPT-4o 一旦达到限制就会停止生成。对于非常大的文档,您可以将文本分成较小的片段进行流式传输,然后让 GPT-4o 对每个片段进行优化。我们还可以使用现成的分块函数,将文本分解成优化后的块,以便在检索数据时获得更细致、更精确的结果。但是,在这种情况下,让我们最大限度地利用 GPT-4o;我们将整个文件一次性发送。用低温进行呼叫,以便我们观察模型将真实文档从头到尾分割成多个部分。
We need to keep an eye on the max_tokens=1024 setting: GPT-4o will stop generating once it hits that limit. For very large documents, you can stream the text in smaller slices—then let GPT-4o refine each slice. We can also use ready-made chunking functions that will break the text down into optimized chunks to obtain more nuanced and precise results when retrieving the data. However, in this case, let’s maximize the usage of GPT-4o; we send the entire file in one call with a low temperature so we can watch the model partition a real-world document from end to end.
现在我们可以从响应中检索数据块,对其进行清理,将它们存储在数据块列表中,然后返回该chunks变量:
Now we can retrieve the chunks from the response, clean them, store them in a list of chunks, and return the chunks variable:
# Extract and clean the response
chunked_text = response.choices[0].message.content
chunks = chunked_text.split("\n\n") # Assume GPT-4o separates chunks with double newlines
return chunks
现在,我们可以调用分块函数了。我们不必显示代码块,并且可以在生产环境中对代码进行注释。不过,在这种情况下,让我们先验证一下一切是否正常:
Now, we can call the chunking function. We don’t have to display the chunks and can comment the code in production. However, in this case, let’s verify that everything is working:
# Chunk the text
chunks = chunk_text_with_gpt4o(text)
# Display the chunks
print("Chunks:")
for i, chunk in enumerate(chunks):
print(f"\nChunk {i+1}:")
print(chunk)
输出结果显示数据块已成功创建:
The output shows that the chunks were successfully created:
Chunks:
Chunk 1:
The CTO was explaining that …
Chunk 2:
GenAISys relies on a generative AI model…
Chunk 3:
We defined memoryless, short-term, long-term…
剩余的嵌入和插入更新步骤与之前的步骤相同Pinecone_instruction_scenarios.ipynb——只需记住在编写向量时使用 ` namespace="data01"\begin{vector}` 即可。之后,我们就可以查询索引并验证检索结果了。
The remaining embedding and upsert steps are identical to those in Pinecone_instruction_scenarios.ipynb—just remember to use namespace="data01" when writing the vectors. After that, we’re ready to query the index and verify retrieval.
正如你现在,我们的向量存储包含两个逻辑区域——genaisys用于指令场景和data01用于经典数据。在本节中,我们将交互式地查询每个区域,以验证检索代码是否正常工作,然后再将其集成到第 4 章的多用户界面中。我们将查询 Pinecone 索引中的这两个命名空间,如图3.5所示:
As you know, our vector store now has two logical areas—genaisys for instruction scenarios and data01 for classical data. In this section, we’ll query each area interactively to prove the retrieval code works before we wire it into the multi-user interface in Chapter 4. We will query these two namespaces in the Pinecone index, as shown in Figure 3.5:
图 3.5:生成式 AI 模型查询指令场景或数据
Figure 3.5: Generative AI model querying either the instruction scenarios or the data
打开以运行验证查询。接下来的步骤与“设置环境”和“创建 Pinecone 索引”Query_Pinecone.ipynb部分中的步骤相同,只有两处细微差别:
Open Query_Pinecone.ipynb to run the verification queries. The next steps are the same as those in the Setting up the environment and Creating the Pinecone index sections, except for two minor differences:
index_name = 'genai-v1'。这是因为查询函数会负责选择命名空间。index_name = 'genai-v1'. This is because the querying function will manage the choice of a namespace.Upserting笔记本的这一部分已被删除,因为我们不是在执行 upsert 操作,而是在查询 Pinecone 索引。Upserting section of the notebook has been removed because we are not upserting but querying the Pinecone index.笔记本的这一Query部分分为两个小节。第一小节包含查询函数,第二小节包含查询请求。我们先从查询函数开始。
The Query section of the notebook is divided into two subsections. The first subsection contains the querying functions and the second one the querying requests. Let’s begin with the querying functions.
There are four querying functions, as follows:
query_vector_store(query_text, namespace)接收查询,将请求发送给 QF2,并返回响应。它将使用 QF4 显示结果。query_vector_store(query_text, namespace), which receives the query, sends the request to QF2, and returns the response. It will use QF4 to display the results.get_query_results(query_text, namespace)接收来自 QF1 的查询,将其发送到 QF3 进行嵌入,执行实际查询,并将响应返回给 QF1。get_query_results(query_text, namespace), which receives the query from QF1, sends it to QF3 to be embedded, makes the actual query, and returns a response to QF1.get_embedding(text, model=embedding_model)接收来自 QF2 的要嵌入的文本,并将嵌入的文本发送回 QF2。get_embedding(text, model=embedding_model), which receives text to embed from QF2 and sends the embedded text back to QF2.display_results(query_results)接收来自 QF1 的结果,对其进行处理,然后将其返回给 QF1。display_results(query_results), which receives the results from QF1, processes them, and returns them to QF1.We can simplify the representation as shown in Figure 3.6 by creating two groups of functions:
query_vector_store一个包含 QF1、 QF4 和 QF1的组display_results,其中 QF1 通过 QF2 查询向量存储并将结果返回以进行显示。query_vector_store, and QF4, display_results, in which QF1 queries the vector store through QF2 and returns the results to display.get_query_results,在将查询嵌入 QF3 后查询向量存储get_embedding,并将结果返回给 QF1。get_query_results, queries the vector store after embedding the query with QF3, get_embedding, and returns the results to QF1.图 3.6:使用两组函数查询向量存储
Figure 3.6: Querying the vector store with two groups of functions
我们先来看第一组函数。
Let’s begin with the first group of functions.
The first function, QF1, receives the user input:
def query_vector_store(query_text, namespace):
print("Querying vector store...")
Then, the function calls QF2, query_results:
# Retrieve query results
query_results = get_query_results(query_text, namespace)
QF2 然后返回结果query_results,该结果又被发送到display_results以获取文本和目标 ID:
QF2 then returns the results in query_results, which, in turn, is sent to display_results to obtain the text and target ID:
# Process and display the results
print("Processed query results:")
text, target_id = display_results(query_results)
return text, target_id
display_results流程它接收到的查询结果和返回结果以及元数据,以在 Pinecone 索引的元数据中查找文本。找到后,该函数检索 ID:
display_results processes the query results it receives and returns the result along with metadata to find the text obtained in the metadata of the Pinecone index. When it is found, the function retrieves the ID:
def display_results(query_results):
for match in query_results['matches']:
print(f"ID: {match['id']}, Score: {match['score']}")
if 'metadata' in match and 'text' in match['metadata']:
text=match['metadata']['text']
#print(f"Text: {match['metadata']['text']}")
target_id = query_results['matches'][0]['id'] # Get the ID from the first match
#print(f"Target ID: {target_id}")
else:
print("No metadata available.")
return text, target_id
文本和 ID 会返回给 QF1, query_vector_storeQF1 再将结果返回给调用该函数的函数。请注意,出于教学目的,此函数假设文本query_results始终至少包含一个匹配项,'metadata'且匹配'text'字段为 `text` 和 `id`。现在让我们看看查询是如何处理的。
The text and ID are returned to QF1, query_vector_store, which, in turn, returns the results when the function is called. Note that for educational purposes, this function assumes query_results will always contain at least one match with 'metadata' and 'text' fields. Let’s now see how the query is processed.
这该程序使用提供的输入文本和命名空间查询 Pinecone 索引get_query_results。但首先,必须嵌入输入文本才能在向量存储中进行向量相似性搜索:
The program queries the Pinecone index with get_query_results with the input text and namespace provided. But first, the input text must be embedded to enable a vector similarity search in the vector store:
def get_query_results(query_text, namespace):
# Generate the query vector from the query text
query_vector = get_embedding(query_text) # Replace with your method to generate embeddings
输入嵌入完成后,将对指定命名空间内的向量化输入进行向量搜索:
Once the input is embedded, a vector search is requested with the vectorized input within the namespace specified:
# Perform the query
query_results = index.query(
vector=query_vector,
namespace=namespace,
top_k=1, # Adjust as needed
include_metadata=True
)
请注意,本例中k设置为“1仅检索单个最佳结果以提高精度”,并且元数据设置为“True包含相应文本”。结果将返回给 QF2 query_results。
Note that k is set to 1 in this example to retrieve a single top result for precision, and also, the metadata is set to True to include the corresponding text. The results are returned to QF2,query_results:
# Return the results
return query_results
嵌入该函数与我们之前用于在 Pinecone 索引中插入数据的函数相同:
The embedding function is the same as what we used to upsert the data in the Pinecone index:
import openai
client = openai.OpenAI()
embedding_model = "text-embedding-3-small"
def get_embedding(text, model=embedding_model):
text = text.replace("\n", " ")
response = client.embeddings.create(input=[text], model=model)
embedding = response.data[0].embedding
return embedding
请确保使用与嵌入已更新数据时相同的模型来嵌入查询,以便嵌入的输入与存储的嵌入数据采用相同的向量格式。这对于相似性搜索进行准确的相似性计算至关重要。
Make sure to use the same model to embed queries as you did to embed the data you upserted so that the embedded input is in the same vector format as the embedded data stored. This is critical for similarity search to make accurate similarity calculations.
现在我们准备运行两个测试:指令场景查询(命名空间genaisys)和源数据查询(命名空间data01)。
We’re now ready to run two tests: an instruction scenario query (namespace genaisys) and a source data query (namespace data01).
检索在指令执行场景中,我们将输入用户指令和命名空间,以便系统找到最接近的指令来执行:
To retrieve an instruction scenario, we will enter a user input and the namespace to let the system find the closest instruction to perform:
# Define your namespace and query text
namespace = "genaisys" # Example namespace
query_text = "The customers like the idea of travelling and learning. Provide your sentiment."
系统应检测用户简要提出的任务请求,并返回完整的操作说明。为此,我们将调用函数的入口点,query_vector_store并显示返回的输出:
The system should detect the task briefly asked for and return a comprehensive instruction scenario. For that, we’ll call the entry point of the functions, query_vector_store, and display the output returned:
# Call the query function
text, target_id = query_vector_store(query_text, namespace)
# Display the final output
print("Final output:")
print(f"Text: {text}")
print(f"Target ID: {target_id}")
输出结果令人满意,可以用于第四章的对话循环中:
The output is satisfactory and is ready to be used in Chapter 4 in a conversational loop:
Querying vector store...
Processed query results:
ID: 2, Score: 0.221010014
Querying response time: 0.54 seconds
Final output:
Text: 200,Sentiment analysis Read the content return a sentiment analysis nalysis on this text and provide a score with the label named : Sentiment analysis score followed by a numerical value between 0 and 1 with no + or - sign and add an explanation to justify the score.
Target ID: 2
这程序现在从 Pinecone 索引中检索数据。由于命名空间是一个变量,因此查询函数完全相同。我们直接来看查询和输出。查询指向数据命名空间:
The program now retrieves data from the Pinecone index. The query functions are identical since the namespace is a variable. Let’s just look at the query and output. The query is directed to the data namespace:
# Define your namespace and query text
namespace = "data01" # Example namespace
query_text = "What did the CTO say about the different types of memory?"
The result is printed:
# Display the final output
print("Final output:")
print(f"Text: {text}")
print(f"Target ID: {target_id}")
输出结果令人满意:
The output is satisfactory:
Querying vector store...
Processed query results:
ID: 3, Score: 0.571151137
Querying response time: 0.45 seconds
Final output:
Text: We defined memoryless, short-term, long-term memory, and cross-topic memory. For the hybrid travel marketing campaign, we will distinguish semantic memory (facts) from episodic memory (personal events in time, for example). The CTO said that we will need to use episodic memories of past customer trips to make the semantic aspects of our trips more engaging.
Target ID: 3
我们有因此,我们填充了一个 Pinecone 向量存储并对其进行了查询。在继续向 GenAISys 添加更多层之前,让我们先总结一下 Pinecone 索引的实现。
We have thus populated a Pinecone vector store and queried it. Let’s summarize the implementation of the Pinecone index before we move on to adding more layers to our GenAISys.
在本章中,我们超越了传统的RAG(红绿灯)方法,将GenAISys项目向前推进了一步。首先,我们在源数据集之上叠加了专家编写的指令场景,将静态的RAG流程转变为动态框架,该框架不仅可以获取事实,还可以获取模型应遵循的精确推理模式。全球市场正以惊人的速度发展,用户现在期望在需要时立即获得ChatGPT级别的帮助;如果我们希望跟上时代的步伐,我们的架构必须灵活、成本可控,并且能够实现近乎实时的交付。
In this chapter, we pushed our GenAISys project another step forward by moving beyond ordinary RAG. First, we layered expert-written instruction scenarios on top of the source data corpus, turning a static RAG pipeline into a dynamic framework that can fetch not only facts but also the exact reasoning pattern the model should follow. The global market is accelerating so quickly that users now expect ChatGPT-level assistance the moment a need arises; if we hope to keep pace, our architecture must be flexible, cost-aware, and capable of near-real-time delivery.
我们首先构建了架构,然后引入收益递减规律,以确定何时隐式相似性搜索值得投入计算资源,何时直接显式调用(例如简单的网络搜索)成本更低。有了理论基础,我们编写了一个程序,用于下载、分块、嵌入和更新指令场景到 Pinecone 索引的专用命名空间中。接下来,我们利用 GPT-4o 对源文档执行相同的分块和嵌入操作,并将这些向量存储在第二个命名空间中。两个分区都构建完成后,我们验证了检索层:现在,只需一个查询函数即可将任何提示路由到正确的命名空间,并返回最佳匹配及其元数据。
We began by laying out that architecture, then introduced the law of diminishing returns to determine when an implicit similarity search is worth its compute bill and when a direct, explicit call—such as a simple web search—will do the job more cheaply. With the theory in place, we wrote a program to download, chunk, embed, and upsert the instruction scenarios into a dedicated namespace inside a Pinecone index. Next, we enlisted GPT-4o to perform the same chunk-and-embed routine on the source documents, storing those vectors in a second namespace. Once both partitions were in place, we verified the retrieval layer: a single query function now routes any prompt to the correct namespace and returns the best match along with its metadata.
GenAISys 拥有清晰分离且可即时搜索的场景和数据,具备所需的检索基础架构。在下一章中,我们将把这些组件集成到对话循环中,让系统展示其全面的、可用于业务场景的敏捷性。
With scenarios and data cleanly separated yet instantly searchable, the GenAISys has the retrieval backbone it needs. In the next chapter, we will plug these components into the conversational loop and let the system demonstrate its full, business-ready agility.
立即解锁本书的专属权益扫描此二维码或访问packtpub.com/unlock,然后按书名搜索本书。 |
|
|
注意:开始之前请准备好您的采购发票。 |
Unlock this book’s exclusive benefits nowScan this QR code or go to packtpub.com/unlock, then search for this book by name. |
|
|
Note: Keep your purchase invoice ready before you start. |
如今,企业需要以前所未有的速度设计、生产和交付商品及服务。从在线云服务到食品、药品、服装等配送,响应速度几乎已成为所有领域的关键。这种事件驱动型经济会产生源源不断的任务,而只有同样以事件驱动、以人为本的生成式人工智能系统(GenAISys)才能跟上步伐。
Businesses today need to design, produce, and deliver goods and services at a speed never attained before. Responsiveness has become key in nearly every field, from online cloud services to delivering food, medication, clothing, and so on. Such an event-driven economy produces an endless stream of tasks, and only an equally event-driven, human-centered generative AI system (GenAISys) can keep pace.
即使是自动化程度最高的流程,也仍然离不开人类的判断:当火灾发生、风暴摧毁基础设施或供应链中断时,必须依靠团队而非算法采取行动。一个将人排除在外的先进全息人工智能系统(GenAISys)只是一个神话。因此,本章首先概述了一种打破用户与人工智能之间壁垒的架构,旨在创建一个协作式的多用户聊天机器人。
Human judgment still anchors even the most automated workflows: when fires break out, storms destroy infrastructure, or supply chains falter, teams—not algorithms alone—must act. An advanced GenAISys that leaves people out of the loop is a myth. This chapter, therefore, begins by outlining an architecture that tears down the walls between users and AI to create a collaborative, multi-user chatbot.
首先,我们从宏观层面概述事件驱动的 GenAISys 接口,展示前几章中提到的构建模块——短期记忆、情景记忆和长期记忆、多轮对话代理以及用于指令场景和数据的双 RAG 流水线——是如何协同工作的。为了实现响应式系统,我们需要编写 GenAISys 的进程代码,以及管理生成式 AI 代理的对话代理代码。GenAISys 接口构建完成后,我们将运行一个多用户、多轮对话,对话对象为三位在一家在线旅行社工作的用户。他们的在线会议将包含一个对话式 AI 代理作为参与者。
First, we sketch the event-driven GenAISys interface at a high level, showing how the building blocks from earlier chapters—short-term, episodic, and long-term memory, the multi-turn conversational agent, and twin RAG pipelines for instruction scenarios and data—fit together. To then implement the responsive system, we must code the processes of the GenAISys and then the conversational agent that will manage the generative AI agent. Once our GenAISys interface is built, we will run a multi-user, multi-turn conversation with three users working in an online travel agency. Their online meeting will include a conversational AI agent as a participant.
这些用户将能够与人工智能代理进行在线会议,无论是否使用人工智能代理。他们可以利用 RAG 查找教学场景,或者直接向生成式人工智能代理提问。在本章结束时,我们将拥有一个功能齐全的 GenAISys 接口,为第五章中的多模态思维链扩展做好准备。
These users will be able to have an online meeting with or without the AI agent. They will be able to utilize RAG to find instruction scenarios or simply ask the generative AI agent to answer a question. By the end of the chapter, we will have a fully working GenAISys interface ready for the multimodal chain-of-thought extensions in Chapter 5.
简而言之,本章涵盖以下主题:
In a nutshell, this chapter covers the following topics:
我们的首要任务是定义一个事件驱动的 GenAISys 接口。
Our first task is to define an event-driven GenAISys interface.
我们基于事件驱动的GenAISys接口集成了我们在前几章中构建的功能。界面将利用 IPython 小部件的灵活性来创建一个响应式事件驱动环境,其中适用以下规则:
Our event-driven GenAISys interface integrates the functionality we built in the previous chapters. The interface will leverage the flexibility of IPython widgets to create a reactive event-driven environment in which the following apply:
我们将首先从宏观层面审视我们正在构建的程序,如图4.1所示:
We will first examine the program we are building at a high level, as represented in Figure 4.1:
图 4.1:GenAISys 接口的高级架构
Figure 4.1: High-level architecture of the GenAISys interface
让我们回顾一下前几章中已经实现的功能,并列出本章要添加的关键功能:
Let’s go through the functions we have already built in the previous chapters and also list the key ones we are adding in this chapter:
To build this architecture, we will do the following:
本章及下一章中,GenAISys架构的主要组件均未使用箭头进行图示,这一决定是经过深思熟虑的,旨在传达一个核心概念:模块化和架构灵活性。该图并非僵化的蓝图,而是一个概念工具包。它展示了您可用的强大组件——I1. AI控制器、I2. 多用户聊天机器人、F1. 生成式AI模型、F2. 记忆保持、F3. 模块化RAG以及F4. 多功能性——这些组件以独立且可互操作的模块形式呈现。这赋予您自主权,表明您可以自由设计自己的系统架构。例如,用户可以选择将某些功能组件(例如F4. 多功能性)作为独立的分布式代理运行,并由控制器调用。或者,他们也可以实现完全不同的接口,甚至可以无头运行系统。
The decision to present the main components of the GenAISys architecture (in this chapter and the next) without arrows is a deliberate choice designed to convey a core concept: modularity and architectural flexibility. The figure is not a rigid blueprint but rather a conceptual toolkit. It shows you the powerful components at your disposal—I1. AI controller, I2. Multi-user chatbot, F1. Generative AI model, F2. Memory retention, F3. Modular RAG, and F4. Multifunctional capabilities—as independent, interoperable blocks. This empowers you, illustrating that you are free to design your own system architecture. For instance, a user could choose to run some functional components, such as F4. Multifunctional capabilities, as independent, distributed agents that are called upon by the controller. Alternatively, they could implement a completely different interface or even run the system headlessly without one.
然而,此架构的重点在于展示以人为中心的生成式人工智能系统(GenAISys)。在此配置中,I1. AI 控制器(生成式人工智能 IPython 接口)充当中央枢纽和协调器。这种以人为中心的架构保证了完全的控制和透明性。这对于在规避风险的企业环境中建立信任至关重要。控制流程虽然没有用箭头标示,但却是隐式的:来自I2. 多用户聊天机器人的用户交互由 AI 控制器管理,然后 AI 控制器策略性地将任务委派给各个功能组件(F1至F4),以生成响应、访问内存、执行 RAG 或执行特定功能。这种方法为构建可用于业务的生成式人工智能系统提供了一条清晰、稳定且可解释的路径。
However, the focus of this architecture is on demonstrating a human-centered GenAISys. In this configuration, I1. AI controller (the generative AI IPython interface) serves as the central hub and orchestrator. This human-centered architecture guarantees full control and transparency. This is essential to build trust in risk-averse corporate environments. The control flow, while not drawn with arrows, is implicit: user interactions from I2. Multi-user chatbot are managed by the AI controller, which then strategically delegates tasks to the various functional components (F1 to F4) to generate responses, access memory, perform RAG, or execute specific functions. This approach provides a clear, stable, and explainable pathway to building a business-ready generative AI system.
我们先来探讨一下场景驱动的任务执行。
Let’s first explore scenario-driven task execution.
首先,我们使用 IPython 小部件来构建如图 4.2所示的 GenAISys 界面。响应式、事件驱动的环境。最终将呈现一个动态的多用户聊天界面,包含下拉菜单、文本输入框和复选框——满足用户与生成式人工智能代理之间实时协作所需的一切。
Let’s begin by building the GenAISys interface shown in Figure 4.2, using IPython widgets to create a responsive, event-driven environment. The result will be a dynamic multi-user chat surface with drop-down menus, text-input fields, and a checkbox—everything needed for real-time collaboration between people/users and the generative AI agent.
打开Event-driven_GenAISys_framework.ipynbGitHub 上 Chapter04 目录下的 notebook 文件(https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main)。环境设置与前几章所述相同:
Open Event-driven_GenAISys_framework.ipynb notebook within the Chapter04 directory on GitHub (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main). Setting up the environment is the same as described in the previous chapters:
openai_api.make_openai_api_callopenai_api.make_openai_api_callipythonnotebook 环境需要额外的软件包( )。IPython 已预装在 Google Colab 中;如果需要,请使用以下命令安装:
An additional package (ipython) is required for the notebook environment. IPython is pre-installed in Google Colab; if needed, install it using the following:
!pip install ipython
我们将构建的代码演示了事件驱动交互、动态内容更新和模块化功能组织等核心概念。在本节结束时,您将学习如何弥合人工智能功能与最终用户互动之间的差距。
The code we’ll build demonstrates core concepts such as event-driven interactions, dynamic content updating, and modular function organization. By the end of this section, you will have learned how to bridge the gap between AI functionality and end user engagement.
图 4.2:事件驱动型 GenAISys 接口的流程图
Figure 4.2: The flowchart of an event-driven GenAISys interface
The main groups of functions required to build this interface are the following:
在从开发者的角度深入研究代码之前,让我们先从用户的角度出发。我们必须构建一个直观的界面,能够无缝执行图 4.2中概述的流程。
Before diving into the code from a developer’s perspective, let’s keep the user’s point of view in mind. We must build an intuitive interface that can seamlessly execute the flow outlined in Figure 4.2.
图 4.3:用户视角下的 GenAISys
Figure 4.3: GenAISys from a user’s perspective
小贴士:需要查看此图片的高分辨率版本吗?请在新一代 Packt 阅读器中打开本书,或在 PDF/ePub 版本中查看。
Quick tip: Need to see a high-resolution version of this image? Open this book in the next-gen Packt Reader or view it in the PDF/ePub copy.
购买本书即可免费获赠新一代 Packt Reader 阅读器。扫描二维码或访问packtpub.com/unlock,然后使用搜索栏按书名查找本书。请仔细核对显示的版本,确保您购买的是正确的版本。
The next-gen Packt Reader is included for free with the purchase of this book. Scan the QR code OR go to packtpub.com/unlock, then use the search bar to find this book by name. Double-check the edition shown to make sure you get the right one.
用户界面仅包含三个小部件:用于输入提示的输入框、用于选择活跃用户的下拉列表以及用于激活和停用对话式 AI 代理的复选框。
The UI contains only three widgets: an input box for entering prompts, a drop-down list for selecting active users, and a checkbox for activating and deactivating the conversational AI agent.
让我们一起来了解一下如何设置和运行这个交互式 GenAISys 环境。
Let’s walk through the process of setting up and running this interactive GenAISys environment.
该程序从多用户对话开始,代理作为参与者单元。我们首先导入所需的模块和库,从以下代码开始IPython:
The program starts from the Multi-user conversation with the agent as a participant cell. We first import the modules and libraries we need, starting with IPython:
from IPython.display import display, HTML, clear_output
让我们逐一了解将在 Google Colab 中实现的各项功能:
Let’s go through each functionality we will be implementing in Google Colab:
display并HTML显示诸如小部件、图像和富 HTML 输出之类的对象display and HTML to display objects such as widgets, images, and rich HTML outputsclear_output清除单元格的输出clear_output to clear the output of a cell然后,我们导入ipywidgets由 Jupyter 项目管理的模块:
Then, we import ipywidgets managed by the Jupyter project:
from ipywidgets import Dropdown, Text, Checkbox, VBox, Layout
ipywidgets是核心这是本笔记本中交互式界面的一个组成部分,我们将使用以下控件:
ipywidgets is the core component of the interactive interface in this notebook, in which we will use the following widgets:
Dropdown:一个下拉菜单控件,用于从选项列表中选择一个值。Dropdown: A drop-down widget to select a value from a list of optionsText用户输入文本的小部件Text: A widget for text input from a userCheckbox用于布尔值选中/未选中输入的控件Checkbox: A widget for Boolean checked/unchecked inputVbox一个用于以垂直盒式布局排列子组件的容器组件Vbox: A container widget to arrange child widgets in a vertical box layoutLayout:通过宽度、高度和边距等布局属性自定义小部件的样式Layout: To customize the style of the widgets with layout properties such as width, height, and margin最后,我们导入 JSON,用于存储多用户对话历史记录:
Finally, we import JSON, used to store multi-user conversation histories:
import json
然后,我们初始化所有用户的对话历史记录,定义第一个活跃用户,并将活跃对话设置为True:
We then initialize the conversation histories for all users, define the first active user, and set the active conversation to True:
# Initialize conversation histories for all users and active user
user_histories = {"User01": [], "User02": [], "User03": []}
active_user = "User01" # Default user
conversation_active = True
因此,从一开始,我们就着手构建一个多用户协作的 GenAISys 系统,用户可以在其中进行交互。可以是人工提示,也可以是系统提示。例如,“用户”可以是来自其他系统的消息,并在此界面中由读取待处理消息的事件触发。用户列表可以扩展、存储在变量中,或用于任何符合项目需求的用户管理系统,包括访问权限、密码以及各种应用程序的角色。接下来,我们初始化组件本身。
We are thus, from the start, initializing a multi-user collaborative GenAISys in which the users can be human prompts and system prompts. For example, a “user” could be a message from another system and triggered in this interface by an event that reads pending messages. The user list can be expanded, stored in variables, or utilized in any user management system that suits a project’s needs, including access rights, passwords, and roles for various applications. Next, we initialize the widgets themselves.
现在,代码会设置我们需要的Dropdown、Text和Checkbox小部件。这些小部件也与事件处理程序关联。Dropdown用户小部件定义了对话开始时初始化的三个用户:
The code now sets up the Dropdown, Text, and Checkbox widgets we need. The widgets are also linked to event handlers. The Dropdown widget for the users defines the three users initialized at the start of the conversation:
# Create a dropdown to select the user
user_selector = Dropdown(
options=["User01", "User02", "User03"],
value=active_user,
description='User:',
layout=Layout(width='50%')
)
该选择器有四个参数:
The selector has four parameters:
options列出可扩展的可用用户,并可根据项目需要访问任何用户管理存储库。options lists the available users that can be expanded and can access any user management repository as needed for your project.value确定当前活动用户。程序启动时默认User01使用初始用户。当授权用户首次连接到 GenAISys 时,此过程可以自动完成。value determines the active user. The program started with User01 as the initial user. This can be automated when an authorized user first connects to the GenAISys.description为要显示的下拉列表提供标签。description provides a label for the drop-down list that will be displayed.layout设置要显示的控件的宽度。layout sets the width of the widget that will be displayed.请注意,我们正在构建的是 GenAISys 的核心,而非平台。我们的目标是掌握 GenAISys 的内部运作机制。一旦它按预期运行,我们就可以添加传统的用户管理层(名称、角色和权限)。在这种情况下,我们始终专注于 GenAISys 灵活的核心概念,而不是如何将它们封装到特定的平台和框架中。我们正在学习如何成为生成式 AI 智能体的架构师,而不是特定框架的运维者。
Note that we are creating a core GenAISys, not a platform. The goal is to grasp the inner workings of a GenAISys. Once it works as expected, we can then add the classical layers of user management (names, roles, and rights). In this case, we are remaining focused on the flexible core concepts of GenAISys, not how they will be encapsulated in a specific platform and framework. We are learning how to be generative AI agentic architects, not operators of a specific framework.
下一步是插入事件处理程序。在本例中,它是一个事件监听器,用于检测值何时user_selector发生变化。当选择另一个用户时,该on_user_change函数会自动调用,并将value用户切换到新用户:
The next step is to insert an event handler. In this case, it is an event listener that will detect when the value of user_selector changes. When another user is selected, the on_user_change function is automatically called, and value becomes the new user:
user_selector.observe(on_user_change, names='value')
GenAISys对话中用户的这种动态变化代表着一次重大的演变。一对一聊天机器人。它为以人工智能为共同参与者的协作团队工作引入了一个全新的维度。
This dynamic change in users within a GenAISys conversation represents a major evolution from the one-on-one chatbots. It introduces a whole new dimension to collaborative teamwork with AI as a co-participant.
第二个要激活的控件是输入控件:
The second widget to activate is the input widget:
# Create the input box widget
input_box = Text(placeholder="Type your message here or type 'exit' to end the conversation.", layout=Layout(width='100%'))
输入框可以是任何文本,并将占据 100% 的 UI 布局。当用户输入“是exit”或“否”时,对话结束quit。当输入文本并按下Enter键时,事件处理程序将接管控制权:
The input can be any text and will occupy 100% of the UI layout. The conversation ends when a user enters exit or quit. When the text is typed and the Enter button is pressed, the event handler takes over:
input_box.on_submit(handle_submit) # Attach the on_submit event handler
on_submit是一个input_box该组件的方法handle_submit是一个回调函数,我们可以根据需要编写它,稍后将在本节中进行描述。
on_submit is a method of the input_box widget. handle_submit is a callback function that we can write as we wish and will be described later in this section.
第三个组件是用于启用人工智能对话代理的复选框:
The third widget is the checkbox for the AI conversational agent:
# Create a checkbox to toggle agent response
agent_checkbox = Checkbox(
value=True,
description='Agent',
layout=Layout(width='20%')
)
该复选框显示描述标签,在本例中指的是一个代理。布局将占据用户界面 20% 的空间。如果value设置为 true ,则会激活对话式 AI 代理。我们将在本章的“对话式代理”True部分构建该 AI 代理。该 AI 代理还将采用事件驱动机制。
The checkbox displays the description label, which is an agent in this case. The layout will occupy 20% of the UI. If value is set to True, then the conversational AI agent will be activated. We will build the AI agent in the Conversational agent section of this chapter. The AI agent will also be event-driven.
用户界面框现在可以显示了。
The UI box is now ready to be displayed.
用户界面VBox容器组件现在整合了我们在(V表示垂直;即,在一个垂直的盒子中)定义的三个事件驱动组件。这三个组件用括号括起来:
The UI container widget now combines the three event-driven widgets we defined in VBox (V stands for vertical; i.e., in a vertical box). The three widgets are in brackets:
# Display the initial interface
display(
VBox(
[user_selector, input_box, agent_checkbox],
layout=Layout(
display='flex', flex_flow='column',
align_items='flex-start', width='100%'
)
))
然后定义布局:
The layout is then defined:
layout=Layout(
display='flex', flex_flow='column',
align_items='flex-start', width='100%'
)))
此响应式用户界面的参数如下:
The parameters of this responsive UI are the following:
display='flex'动态激活 CSS Flexbox 模型进行布局,而无需指定元素的大小。display='flex' activates the CSS flexbox model for layouts dynamically without specifying the sizes of the itemsflex_flow='column'垂直排列子控件flex_flow='column' arranges the child widgets verticallyalign_items='flex-start'将控件对齐到 UI 容器的起始位置(左侧)align_items='flex-start' aligns the widgets to the start (left side) of the UI (left side) containerwidth='100%'使容器占据可用空间的全部宽度width='100%' makes the container take up the full width of the available space至此,用户界面已准备就绪。我们可以选择从三个组件中的任意一个开始。用户选择器和AI代理复选框都可以在输入框之前运行。在本例中,用户选择器User01和AI代理复选框均设置为默认值True。
With that, the UI is ready. We can choose to begin with any of the three widgets. The user selector can be run before the input, as well as the AI agent checkbox. In this case, the user selector was set to a default value, User01, and the AI agent checkbox was set to the default value, True.
根据项目需求,这三个组件及其相关流程可以集成到任何经典的网页或软件界面中。由于输入没有默认值,我们继续讨论输入组件。
The three widgets and their processes can be built into any classical web or software interface, depending on your project’s needs. Since there is no default value for the input, let’s continue with the input widget.
输入文本由上一节所述的用户界面进行管理,该用户界面会input_box.on_submit(handle_submit)在用户输入文本时触发。随后,该submit方法会调用以下handle_submit函数:
The input text is managed by the UI described in the previous section, which triggers input_box.on_submit(handle_submit) when a user enters text. In turn, the submit method calls the handle_submit function:
# Function to handle the submission of the input
def handle_submit(sender):
user_message = sender.value
if user_message.strip():
sender.value = "" # Clear the input box
chat(user_message)
Now, the function does three things:
user_message = sender.value处理从输入控件接收的文本user_message = sender.value processes the text received from the input widgetif user_message.strip()检查是否有消息,并清除输入框以便进行下一次输入。sender.value = "" # Clear the input boxif user_message.strip() checks whether there is a message and clears the input box for the next input with sender.value = "" # Clear the input boxchat(user_message)如果有消息,则会调用此函数。chat(user_message) is called if there is a messagechat(user_message)这是GenAISys的下一个流程,也是一个关键的事件处理中心。让我们一起来了解一下。
chat(user_message) is the next process and a key event processing hub for the GenAISys. Let’s go through it.
该chat(user_message)函数是我们事件驱动型 GenAISys 的一个协调器组件。它应该继续以人为本,以服务于关键的人类需求。控制。一旦系统获得用户信任并经过仔细考虑,其管理的某些操作可以由系统消息触发。协调器在处理从函数接收到的用户消息时会做出重要决策。它封装了多个选择和功能,如图 4.2handle_submit(sender)所示:决定是否继续对话、将对话历史记录追加或保存到文件、确定是否调用 AI 对话代理以及更新 UI 显示。
The chat(user_message) function is an orchestrator component of our event-driven GenAISys. It should remain human-centered for critical human control. Once the system has gained the trust of the users and after careful consideration, some of the actions it manages can be triggered by system messages. The orchestrator contains important decisions when it processes the user message it receives from the handle_submit(sender) function. It encapsulates several choices and functions, as represented in Figure 4.2: deciding whether to continue the conversation, appending or saving the conversation history to a file, determining whether to call the AI conversational agent, and updating the UI display.
conversation_active = True它首先继承了我们在对话开始时初始化的对话变量()的全局状态(在图 4.2的节点1中):
It first inherits the global status of the conversation variable (conversation_active = True) we initialized at the start of the conversation (in node 1 of Figure 4.2):
# Function to handle user input and optional bot response
def chat(user_message):
global conversation_active
它会继续检查用户是否已退出或结束对话,以确定多轮对话是否结束(参见图 4.2中的6):
It continues to determine whether the multiple-turn conversation is over or not by checking whether the user has exited or quit the conversation (see 6 in Figure 4.2):
if user_message.lower() in ['exit', 'quit']:
让我们看看如果用户选择退出对话会发生什么。
Let’s see what happens if the user chooses to exit the conversation.
假设用户输入exit“或” quit;那么conversation_active我们将变量设置为True“”。对话开始时间(图 4.2中的节点1)现在将被设置为。系统现在知道无需再更新显示。然后,它指示函数等待下一轮对话,以清除输出,从而避免闪烁:Falseclear_output
Suppose the user enters exit or quit; then the conversation_active variable we set to True at the start of the conversation (in node 1 of Figure 4.2) will now be set to False. The system now knows that there is no need to update the display anymore. It then tells the clear_output function to wait until the next conversation turn to clear the output to avoid flickering effects:
clear_output(wait=True)
退出过程会继续显示一条消息,表明对话结束,并提示对话历史记录正在保存:
The exit process continues by displaying a message signaling the end of the conversation and indicating that the conversation history is being saved:
display(HTML("<div style='color: red;'><strong>Conversation ended. Saving history...</strong></div>"))
退出过程以调用对话的保存函数结束,该函数会将所有历史记录保存到文件中(参见图 4.2中的节点7):
The exit process ends by calling the save function of the conversation, which will save all history to a file (see node 7 in Figure 4.2):
save_conversation_history()
因此,会话结束时会保存对话内容以供后续使用(用于新会话或会议摘要),如图 4.2中的节点7所示:
The conversation is thus saved at the end of the session for further use (for a new session or a meeting summary), as shown in node 7 of Figure 4.2:
# Function to save conversation history to a file
def save_conversation_history():
filename = "conversation_history.json" # Define the filename
with open(filename, 'w') as file:
json.dump(user_histories, file, indent=4) # Write the user histories dictionary to the file in JSON format
display(HTML(f"<div style='color: green;'><strong>Conversation history saved to {filename}.</strong></div>"))
现在,让我们来看看用户选择继续对话时的过程。
Now, let’s go through the process when the user(s) chooses to continue the conversation.
如果用户输入不包含exit`or` quit,则多轮多用户对话将继续进行。然而,关于这个功能,我们需要做出一些重要的决定。我们是否要将其添加到 `是否应该每次用户请求都添加令牌?如果每次用户请求都添加令牌,上下文窗口最终会填满,但通过 API 发送的令牌数量会增加处理时间和成本。
If the user input does not contain exit or quit, then the multi-turn, multi-user conversation will continue. We have some big decisions to make with this function, however. Do we append it to each user request or not? If we append it to each user request, at some point, the context window will be complete, but the number of tokens we send through the API will increase processing time and costs.
第一步是将我们在开始时初始化的对话历史记录(图 4.2中的节点1)附加到此处:
The first step is to append the history of the conversation we initialized at the start (in node 1 of Figure 4.2):
# Append user message to the active user’s history
user_histories[active_user].append(
{“role”: “user”, “content”: user_message}
)
因此,在本笔记本的混合场景中,我们会将用户历史记录保存在内存中直至会话结束,从而将每个用户的输入与其历史记录进行合并,如图4.2中的节点11所示。如果用户输入不包含“或” ,则多轮多用户对话将继续进行,并将用户消息追加到该用户的历史记录中(如图 4.2中的节点8所示)。exitquit
So, in the hybrid scenario of this notebook, at this point, we will save the user history in memory until the end of the session, and we will thus augment each user’s input with their input history, as seen in node 11 of Figure 4.2. If the user input does not contain exit or quit, then the multi-turn, multi-user conversation will continue. It will append the user message to the history (in node 8 of Figure 4.2) of the user.
但是,如果我们不想将用户请求附加到对话记录中,但仍想保留整个对话的上下文信息,我们也可以在对话进行到一半或结束时进行总结。如果在对话过程中进行总结,我们可以添加一个函数将其附加到用户记录中。每次都需要输入。如果我们在会话结束后进行总结,就可以带着上一会话的历史记录摘要开始新的会话。
However, if we don’t want to append a user request to it but still want to keep a record of the entire conversation for context, we can also summarize the conversation at the midpoint or the end. If we summarize it during the conversation, we can add a function to append it to the user input each time. If we summarize after the end of a session, we can continue with a new, fresh session with a summary of the previous session’s history.
在本笔记本中,我们将实现一个混合短期和长期记忆的过程。我们可以通过不输入“是”'quit'或“否”来继续对话'exit'。现在,该chat(user_message)函数将检查对话代理的复选框值:
In this notebook, we will implement a hybrid short- and long-term memory process. We can continue the conversation by not entering 'quit' or 'exit'. Now, the chat(user_message) function will check the conversational agent’s checkbox value:
if agent_checkbox.value:
此验证过程如图 4.2中的节点9所示。如果选中复选框,则通过调用以下函数激活我们在前几章中创建的函数:chat_with_gpt
This verification is shown in node 9 in Figure 4.2. If the checkbox is checked, then the functions we created in the previous chapters are activated by calling chat_with_gpt:
response = chat_with_gpt(user_histories[active_user],
user_message)
响应返回后,会将其添加到前面描述的响应历史记录中:
Once the response is returned, it is appended to the history of the response described previously:
user_histories[active_user].append(
{“role”: “assistant”, “content”: response}
)
我们现在有了一个入口点内存框架。程序随后调用了另一个关键函数,如图 4.2中的节点14update_display()所示。如果选中了代理复选框,则会调用该函数。chat_with_gpt
We now have an entry-point memory framework. The program then calls update_display(), another key function that is shown in node 14 of Figure 4.2. If the agent checkbox is checked, chat_with_gpt will be called.
该chat_with_gpt函数整合了我们在前几章中所做的工作,以创建一个对话式的文本。具备基于 Pinecone 的 RAG 功能的 AI 代理。我们将在本章的对话代理部分全面实现此集成。
The chat_with_gpt function assembles the work we did in the previous chapters to create a conversational AI agent with the Pinecone-based RAG functionality. We will fully implement this integration in the Conversational agent section of this chapter.
chat_with_gpt通过提供信息来协调人工智能对话代理,使其能够动态响应。本次对话的用户历史记录和用户消息将被发送给chat_with_gpt对话代理函数:
chat_with_gpt orchestrates the AI conversational agent by providing information, enabling it to be dynamic and responsive. The user history of this conversation and the user message are sent to the chat_with_gpt conversational agent function:
response = chat_with_gpt(user_histories[active_user], user_message)
一旦返回响应,update_display就会从调用该函数chat(user_message)。
Once the response is returned, the update_display function is called from chat(user_message).
该update_display功能会刷新用户界面,显示更新后的对话历史记录,并且wait显示控件的状态。它首先通过设置以下值来告诉 UI 等待新的输出到达True:
The update_display function refreshes the UI with the updated conversation history and also displays the status of the widgets. It first tells the UI to wait until a new output arrives by setting wait to True:
def update_display():
clear_output(wait=True)
然后该函数筛选并显示当前用户的历史记录(参见图 4.2中的节点15):
The function then filters and displays the active user’s history (see node 15 of Figure 4.2):
for entry in user_histories[active_user]: # Show only the active user's history
if entry['role'] == 'user':
display(HTML(f"<div style='text-align: left; margin-left: 20px; color: blue;'><strong>{active_user}:</strong> {entry['content']}</div>"))
elif entry['role'] == 'assistant':
display(HTML(f"<div style='text-align: left; margin-left: 20px; color: green;'><strong>Agent:</strong> {entry['content']}</div>"))
如果对话处于活动状态,VBox则会显示用户界面以及组件的状态:
If the conversation is active, the UI VBox is displayed along with the status of the widgets:
if conversation_active:
display(VBox([user_selector, input_box, agent_checkbox])) # Keep input box, selector, and checkbox visible if active
输入框已清空,用户已自主勾选代理复选框,系统已验证其状态。当前用户将根据用户的自主选择显示。在本例中,对话active_user开始时初始化的当前用户(即1on_user_change )保持不变。如果用户发生更改,则下拉事件 (13observe )将由控件的方法触发user_selector:
The input box is cleared, the agent checkbox has been checked independently by the user, and the system has verified its status. The active user will be displayed based on the independent decision of the user. In this case, the active user, active_user, who was initialized at the start (1) of the conversation, remains the same. If the user changed, the on_user_change drop-down event (13) would have been triggered by the observe method of the user_selector widget:
user_selector.observe(on_user_change, names='value')
在这种情况下,user_selector.observe将独立调用update active_user函数(14),并首先确保活动用户是一个全局变量:
In that case, user_selector.observe will independently call the update active_user function (14) and first make sure that the active user is a global variable:
def on_user_change(change):
global active_user
Then, it will make the new user the active user:
active_user = change['new']
最后,它将调用update_display我们在本小节中构建的函数:
Finally, it will call the update_display function we built in this subsection:
update_display()
现在我们已经有了动态用户界面和事件驱动功能,接下来让我们实现由 调用的对话代理逻辑chat_with_gpt。
Now that we have our dynamic UI and event-driven functions in place, let’s implement the conversational agent logic called by chat_with_gpt.
我们在第 1 章和第2章实现了一个 AI 对话代理,并在第 3 章构建了查询 Pinecone 功能。请转到笔记本中的“对话代理”部分。如有需要,请参考……在继续之前,是时候回顾一下前面的章节了。在本节中,我们将整合这些组件,为 GenAISys 对话代理的多用户会话做好准备。
We implemented an AI conversational agent in Chapters 1 and 2 and built the query Pinecone functionality in Chapter 3. Go to the Conversational agent section of the notebook. If needed, take the time to revisit those chapters before proceeding. In this section, it’s time we integrate those components, preparing our GenAISys conversational agent for multi-user sessions.
我们首先导入 OpenAI 并初始化客户端:
We begin by importing OpenAI and initializing the client:
from openai import OpenAI
# Initialize the OpenAI client
client = OpenAI()
接下来,我们决定是否存储每次通话的所有用户对话历史记录,以优化上下文窗口大小,从而提高成本和清晰度:
Next, we make a decision to store or not to store all of the user’s conversation history for each call to optimize context window size for cost and clarity:
user_memory = True # True=User messages are memorized False=User messages are not memorized
在生产环境中,应策略性地监控内存设置。例如,这里我们将其设置user_memory为True,但我们避免在 RAG 查询期间应用它,因为历史上下文可能会干扰 Pinecone 相似性搜索。然后,我们定义函数,该函数在图 4.2的节点10chat_with_gpt中被调用:
The memory setting should be strategically monitored in production environments. For example, here we set user_memory to True, but we avoid applying it during RAG queries, as historical context could confuse the Pinecone similarity searches. We then define the chat_with_gpt function, which is called in node 10 of Figure 4.2:
def chat_with_gpt(messages, user_message):
该函数首先在输入文本中搜索关键字,以触发从 Pinecone 索引中检索 RAG 数据,具体实现Query_Pinecone.ipynb和描述见第 3 章。代码首先确定命名空间:
The function first searches the input text for a keyword to trigger a RAG retrieval from the Pinecone index as implemented in Query_Pinecone.ipynb and described in Chapter 3. The code first determines the namespace:
try:
namespace=""
if "Pinecone" in user_message or "RAG" in user_message:
# Determine the keyword
if "Pinecone" in user_message:
namespace="genaisys"
elif "RAG" in user_message:
namespace="data01"
print(namespace)
…
如果用户消息包含“Pinecone”,则查询将指向genaisys包含指令场景的命名空间。genaisys命名空间的实现方式与静态命名空间有所不同。数据检索将我们带入到智能体、动态决策的阶段,从而触发指令或任务。如果用户消息包含“RAG”,则查询将针对包含静态数据的命名空间。Pinecone 索引的查询和内容在第 3 章data01中已实现:
If the user message contains “Pinecone,” the query will target the genaisys namespace, which contains the instruction scenarios. The genaisys namespace implementation departs from static data retrieval and takes us into agentic, dynamic decision-making to trigger an instruction or a task. If the user message contains “RAG,” the query will target the data01 namespace, which contains static data. The queries and content of the Pinecone index are those implemented in Chapter 3:
#define query text
query_text=user_message
# Retrieve query results
query_results = get_query_results(query_text, namespace)
# Process and display the results
print("Processed query results:")
qtext, target_id = display_results(query_results)
print(qtext)
查询结果返回后,我们将用户消息附加到结果中,以扩充输入:
Once the query result is returned, we append the user message to it to augment the input:
#run task
sc_input=qtext + " " + user_message
mrole = "system"
mcontent = "You are an assistant who executes the tasks you are asked to do."
user_role = "user"
消息参数和 OpenAI API 调用在第 1 章的“设置环境”部分中进行了描述。OpenAI 响应存储在:task response
The message parameters and the OpenAI API call are described in the Setting up the environment section of Chapter 1. The OpenAI response is stored in task response:
task_response = openai_api.make_openai_api_call(
sc_input,mrole,mcontent,user_role
)
print(task_response)
OpenAI API 调用返回的响应,结合 Pinecone 查询的结果,存储在aug_output:
The response returned by the OpenAI API call, augmented with the result of the Pinecone query, is stored in aug_output:
aug_output=namespace + ":" +task_response
如果用户消息中不包含触发 RAG 功能的关键字,则用户请求将被视为无效。直接发送到 OpenAI API 调用,响应将存储在aug_output。但是,系统必须首先检查 是否user_memory为True。系统还必须提取 的文本内容user_message:
If the user message does not contain a keyword to trigger the RAG function, the user request will be sent directly to the OpenAI API call, and the response will be stored in aug_output. However, the system must first check whether user_memory is True or not. The system must also extract the text content of user_message:
else:
if user_memory:
# Extract ALL user messages from the conversation history
user_messages_content = [
msg["content"] for msg in messages
if msg["role"] == "user" and "content" in msg
]
# Combine all extracted user messages into a single string
combined_user_messages = " ".join(user_messages_content)
# Add the current user_message to the combined text
umessage = f"{combined_user_messages} {user_message}"
在这种情况下,umessage现在包含了从中提取并存储在其中的活跃用户的对话历史记录combined_user_messages以及用户消息本身user_message。生成式人工智能模型现在拥有了与该用户对话的完整上下文。
In this case, umessage now contains a concatenation of the conversation history of the active user extracted and stored in combined_user_messages and the user message itself in user_message. The generative AI model now has complete context about the dialogue with this user.
对话历史记录的管理策略很大程度上取决于具体的实际应用场景。例如,我们可以选择提取会话中所有用户的历史记录,或者仅提取特定用户的历史记录。此外,团队也可以决定在整个对话过程中使用同一个用户名。通常,最佳实践是组织与最终用户的研讨会,共同定义和配置最适合他们工作流程的对话记忆策略。
The strategy for managing conversation history will depend heavily on each real-world use case. For example, we might choose to extract the history of all users involved in a session or only specific users. Alternatively, a team could decide to use a single shared username throughout an entire conversation. Generally, the best practice is to organize workshops with end users to define and configure the conversation-memory strategies that best fit their workflow.
在某些情况下,我们可能会决定完全忽略对话历史记录。在这种情况下,我们将user_memory参数设置为False`false`,系统将忽略之前的交流:
In some cases, we might decide to ignore the conversation history altogether. In that scenario, we set the user_memory parameter to False, and the system disregards prior exchanges:
else:
umessage = user_message
现在该umessage变量已准备好直接发送给生成式人工智能模型:
The umessage variable is now ready to be sent directly to the generative AI model:
mrole = "system"
mcontent = "You are an assistant who executes the tasks you are asked to do."
user_role = "user"
task_response =openai_api.make_openai_api_call(
umessage,mrole,mcontent,user_role
)
aug_output=task_response
回应然后,将 OpenAI API 调用返回给该函数(图 4.2中的chat_with_gpt节点10):
The response from the OpenAI API call is then returned to the chat_with_gpt function (in node 10 of Figure 4.2):
# Return the augmented output
return aug_output
如果 OpenAI API 调用失败,则会引发并返回一个异常:
If the OpenAI API call fails, an exception is raised and returned:
except Exception as e:
# Return the error message in case of an exception
return f"An error occurred: {str(e)}"
至此,我们已经整合了前三章中开发的生成式人工智能功能。目前,我们构建了一个响应式的 GenAISys 界面,并集成了一个生成式代理,共同构成了一个完整的 AI 控制器和协调器。现在,让我们启动 GenAISys 吧。
And with that, we have assembled the generative AI functionalities developed across the previous three chapters. At this stage, we’ve built a responsive GenAISys interface and integrated a generative agent, together forming a cohesive AI controller and orchestrator. Let’s now put our GenAISys into motion.
我们现在拥有一个响应迅速、事件驱动的GenAISys系统,能够如图 4.4所示,我们将探索使用 IPython 构建的 GenAISys 接口的灵活性,并将前几章中的 OpenAI 和 Pinecone 组件组装起来。
We now have a responsive, event-driven GenAISys capable of executing multiple tasks in diverse ways, as illustrated in Figure 4.4. We will explore the flexibility of this GenAISys interface we built using IPython and assemble the OpenAI and Pinecone components from previous chapters.
图 4.4:总结本章中我们构建和组装的各个组件
Figure 4.4: Summing up the components we have built and assembled in this chapter
由于 GenAISys 的功能是事件驱动的,用户(人或系统)或用户组可以利用此框架处理多个跨领域任务。该系统以人为本,在人与生成式人工智能代理之间创建了一个协作、无摩擦的环境。重要的是,在这个框架中,人与人工智能之间不存在竞争。团队可以在使用 GenAISys 大幅提升绩效和生产力的同时,与同事保持良好的人际关系。这种以人为本的方法正是我数十年来一直倡导的。我们拥有为全球企业、中型企业和小型组织提供人工智能驱动的自动化解决方案的丰富经验。当团队将人工智能视为协作工具而非竞争对手时,就能营造积极的氛围,从而快速取得成效——这充分展现了团队合作与技术相结合的强大力量。
Since the functions within GenAISys are event-driven, a user (human or system) or a group of users can leverage this framework to address multiple cross-domain tasks. The system is human-centric, creating a collaborative, frictionless environment between humans and a generative AI agent. Importantly, there is no competition between humans and AI in this framework. Teams can maintain human social relationships with co-workers while using the GenAISys to boost their performance and productivity exponentially. This human-centric approach is one I have always advocated throughout my decades of experience providing AI-driven automation solutions for global corporations, mid-sized businesses, and smaller organizations. When teams adopt AI as a collaborative tool rather than a competitor, it fosters a positive atmosphere that leads to quick wins—demonstrating the combined effectiveness of teamwork and technology.
如果我们深入研究如何在团队协作场景中利用 GenAISys 框架,我们可以确定现实世界项目中通常需要的几个基本事件序列:
If we look deeper into how the GenAISys framework can be leveraged in teamwork scenarios, we can establish several fundamental sequences of events typically needed in real-world projects:
这些基本序列构成一个序列集S:
These basic sequences constitute a set of sequences, S:
为了实现单个用户或一组用户的目标,可以按如下方式组合序列:
To achieve a goal for a single user or a group of users, the sequences can be assembled as follows:
这些序列根据具体任务和场景的需要,可以将这些序列安排成更长的会话流程。由于序列可以重复,因此我们可以得到无数种可能的动态组合。例如,以下示例展示了这种灵活性:
These sequences can be arranged into longer session flows as required by the specific tasks and scenarios. Because sequences can repeat themselves, we have an indefinite number of possible dynamic combinations. For instance, here’s a glimpse into the flexibility that this provides:
我们可以为这些序列添加退出会话和摘要功能,以及重新加载已保存的文件并继续会话的功能。此外,还可以重复使用某些集合、使用不同用户的集合,以及使用包含更多功能的集合。在接下来的章节中,我们将添加图像生成、音频、网络搜索和机器学习等新功能,从而扩展我们构建的 GenAISys 框架的功能范围。
We could add exiting the session and summarizing to these sequences, as well as reloading a saved file and continuing the session. There can also be a repetition of sets, sets with different users, and sets with more functions. In the following chapters, we will add new features, including image generation, audio, web search, and ML, that will expand the scope of the GenAISys framework we have built.
不过,在本节中,我们将先运行一个包含两名用户的简单会话,并模拟一系列事件。然后,我们将运行一个包含多名用户和一些基本流程的场景。让我们从一个简单的事件序列开始。
In this section, however, we will run a session with two users in a simple sequence of events. Then, we will run a scenario with multiple users and some basic sequences. Let’s begin with a straightforward sequence of events.
在这个示例会话中,两位用户协作集思广益,为他们在线旅游网站上的客户推荐有吸引力的旅游目的地。我们首先运行一个界面会话,然后显示对话历史记录,最后总结讨论内容。要开始会话,请打开Event-driven_GenAISys_framework.ipynb并运行以下单元格:
In this example session, two users collaborate to brainstorm ideas for attractive travel destinations they could recommend to customers on their online travel website. We start by running an interface session, then display the conversation history, and finally summarize the discussion. To begin the session, open Event-driven_GenAISys_framework.ipynb and run these sections of cells:
由于生成式人工智能模型的随机性,每次运行的输出结果可能略有不同。同样,由于质量控制过程中会进行多次运行,本笔记本与印刷版章节之间也可能存在细微差异。
Due to the stochastic nature of generative AI models, the outputs might vary slightly with each execution. Likewise, minor differences may appear between this notebook and the printed chapter, as multiple runs are performed during quality control.
对话已启动,现在让我们运行交互式会话。
With the conversation initialized, let’s now run the interactive session.
默认情况下,对话开始时User01会显示输入框和已激活代理复选框。图 4.5展示了此场景中触发的事件和功能的顺序。
The conversation starts with User01 by default, displaying the input box and the activated agent checkbox. The sequence of events and functions triggered in this scenario is illustrated in Figure 4.5.
图 4.5:GenAI 代理利用用户历史记录执行任务
Figure 4.5: The GenAI agent performs a task with the user’s history
流程按以下顺序进行:
The flow follows this sequence:
用户选择 => 输入 => 代理已检查 => 用户历史记录 => GenAI 代理 => 输出
User selection => Input => Agent checked => User history => GenAI agent => Output
对用户而言,这个过程是无缝衔接的,如图 4.6所示。然而,底层功能的实现需要精心的设计和开发,才能达到这种流畅的效果。
To the user, this process is seamless, as illustrated in Figure 4.6. However, the underlying functions required careful design and development to produce this smooth effect.
图 4.6:已勾选 GenAI 代理的 UI
Figure 4.6: UI with the GenAI agent checked
从用户的角度来看,这个过程很简单。User01只需在输入框中输入提示信息即可:What is the capital of France?。
From the user’s perspective, the process is straightforward. User01 types a prompt into the input box: What is the capital of France?.
图 4.7:用户输入简单提示
Figure 4.7: User entering a simple prompt
输出结果将显示在输入控件上方,如下图所示:
The output will be displayed above the input widget, as shown here:
图 4.8:输出显示在输入框上方
Figure 4.8: Output is displayed above the input box
到了这一点,即使是未经培训的用户可以根据一份单页文档中提供的基本信息直观地运行 GenAISys。User01接下来提出一个后续问题:
At this point, even an untrained user can intuitively run the GenAISys with the basic information provided in a one-page document. User01 continues with a follow-up question:
User01: What is there to visit?
由于对话代理中启用了用户历史记录选项True,因此代理知道问题与巴黎有关,并提供了一些值得参观的标志性景点的信息:
Since the user history option is set to True in the conversational agent, the agent knows that the question is about Paris and provides information on some iconic locations to visit:
Agent: The capital of France is Paris. There are numerous attractions to visit in Paris, including: 1. **Eiffel Tower** - An iconic symbol of Paris, offering stunning views of the city. 2. **Louvre Museum** - The world's largest art museum, home to the Mona Lisa…
User02现在,用户02通过下拉菜单进入对话,如图4.9所示。然后,用户02询问西班牙的首都是什么,代理做出回应,并将输出显示在输入框上方。
User02 now enters the dialogue by selecting User02 in the drop-down menu, as shown in Figure 4.9. User02 then asks what the capital of Spain is, the agent responds, and the output is displayed above the input box.
图 4.9:输入:其他用户可以加入对话
Figure 4.9: Input: Another user can enter the conversation
至此,GenAISys 的核心框架已搭建完成,我们将在后续章节中逐步完善它,例如介绍安全性、用户访问控制和数据隐私等功能。User02接下来,系统会进一步询问值得一游的地方。由于用户历史记录选项已启用,代理程序能够正确响应:
At this point, the GenAISys core framework is set, and we will progressively enhance it in subsequent chapters by addressing features such as security, user access controls, and data privacy. User02 goes further and asks about the places worth visiting. The agent responds correctly because the user history option is activated:
User02: What is there to visit?
Agent: The capital of Spain is Madrid. There are many attractions to visit in Madrid, including: 1. **The Royal Palace of Madrid** - The official residence of the Spanish royal family, although it is now used only for state ceremonies. 2. **Prado Museum** …
现在,一个关键特性发挥作用了。如图 4.10User02所示,它会停用 AI 代理,并直接……地址User01,类似于在社交媒体或远程协作平台上的互动。
Now, a key feature comes into play. User02 deactivates the AI agent, as shown in Figure 4.10, and directly addresses User01, similar to an interaction on social media or remote collaboration platforms.
图 4.10:团队中不使用人工智能代理的直接沟通
Figure 4.10: Direct communication in a team without an AI agent
User01非正式地回应道:“是的,这是个好主意,但我们再考虑一下。”随后,User01通过输入结束会话。如图 4.11exit所示,此功能将我们的 GenAISys 提升到了一个新的水平,适用于本书将要探讨的用例,例如以下配置:
User01 responds informally: “Yes, that’s a good idea, but let’s think it over.” Subsequently, User01 ends the session by typing exit. This capability, as illustrated in Figure 4.11, takes our GenAISys to a new level for the use cases we will explore in this book, such as the following configurations:
图 4.11:团队直接沟通后结束会议
Figure 4.11: A team communicates directly and then ends the session
退出时,会话结束,对话历史记录保存到conversation_history.json文件中:
Upon exiting, the session ends, and the conversation history is saved to the conversation_history.json file:
图 4.12:保存并结束对话
Figure 4.12: Saving and concluding the conversation
像所有该框架的其他特性之一是,退出行为可以根据具体项目进行自定义。请看以下示例:
Like all other features in this framework, the exit behavior can be customized for individual projects. Take the following examples:
"…to the next cell" 由您决定"…to the next cell" is optional这些决策需要根据具体应用场景来制定。它们不会改变 GenAISys 的整体框架,但可以实现高度的定制化。
These are decisions to make for each use case. They will not modify the overall framework of the GenAISys but allow for a high level of customization.
在这种情况下,团队想要展示他们刚刚进行的对话。
In this case, the team wants to display the conversation they just had.
代码此功能为标准用于将 JSON 文件转换conversation_history.json为 Markdown 格式的 IPython 显示函数。首先,我们检查会话历史记录参数和/或摘要参数是否已启用:
The code for this function is a standard IPython display function to convert the JSON file, conversation_history.json, into Markdown format. Let’s first check whether the conversation history parameter and/or the summary parameter is activated:
display_conversation_history=True
summary=True
在这种情况下,对话历史记录和摘要功能都已启用。现在,我们将检查对话历史记录文件是否存在:
In this case, the conversation history and the summary function are both activated. Now, we will check whether a conversation history file is present or not:
import json
from IPython.display import display, Markdown
import os
if display_conversation_history == True or summary==True:
# File path
file_path = 'conversation_history.json'
# Check if the file exists
if os.path.exists(file_path):
display_conversation_history=True
summary=True
print(f"The file '{file_path}' exists.")
else:
display_conversation_history=False
summary=False
print(f"The file '{file_path}' does not exist.")
print("The conversation history will not be processed.")
如果文件存在,display_conversation_history将设置为True(summary=True即使False之前已设置为)。消息将表明文件已存在:
If a file exists, display_conversation_history will be set to True and summary=True (even if it was set to False previously). A message will signal that the file exists:
The file 'conversation_history.json' exists.
如果display_conversation_history==True,则会显示对话内容:
If display_conversation_history==True, then the conversation will be displayed:
# Display option
if display_conversation_history==True:
# File path
file_path = 'conversation_history.json'
# Open the file and read its content into the 'dialog' variable
with open(file_path, 'r', encoding='utf-8') as file:
dialog = json.load(file) # Parse JSON content
…
# Function to format JSON content as markdown
def format_json_as_markdown(data, level=0):
html_output = ""
indent = " " * level
…
return html_output
# Format the JSON into markdown
formatted_markdown = format_json_as_markdown(dialog)
# Display formatted JSON as Markdown
display(Markdown(formatted_markdown))
输出格式良好:
The output is nicely formatted:
…
User01:
role:
user
content:
What is the capital of France?
role:
assistant
content:
The capital of France is Paris.
…
Content:
The capital of Spain is Madrid.
role:
user
content:
What is there to visit?
role:
assistant
content:
The capital of Spain is Madrid. There are many attractions to visit in Madrid, including:
The Royal Palace of Madrid – …
这团队展示了对话内容但他希望进一步推进这一过程,并总结这次有人工智能代理参与的在线会议。
The team has displayed the conversation but wants to take the process further and summarize this online meeting that included an AI agent as a participant.
我们总结的这段对话展示了如何将人工智能代理融入现有的人类团队。为了提高生产力。在某些情况下,GenAISys 可以独立完成自动化任务。在其他情况下,GenAISys 可以作为一位或多位用户的辅助工具。还有一些情况下,在组织运营的诸多关键时刻,人类团队和人工智能代理可以协同工作,共同做出决策。
The conversation we are summarizing shows how to merge an AI agent into an existing human team to boost productivity. In some cases, the GenAISys will have worked on automated tasks alone. In other cases, the GenAISys will be the copilot of one or several users. In others, in the many critical moments of the life of an organization, teams of humans and AI agents will be able to work together to make decisions.
在本节中,我们将要求人工智能代理总结对话。我们将在后续章节中把此功能集成到 GenAISys 中。目前,我们将在显示对话后单独运行此操作,如图4.13所示。
In this section, we will ask the AI agent to summarize the conversation. We will integrate this feature as a function in the GenAISys in the following chapters. For the moment, we will run it separately after displaying the conversation, as shown in Figure 4.13.
图 4.13:显示和总结对话
Figure 4.13: Displaying and summarizing a conversation
代码首先,像显示函数那样加载conversation_history.json文件。然后,我们定义一个转换函数。将对话历史记录内容转换为 OpenAI API 的最佳格式:
The code first loads the conversation_history.json file as in the display function. Then, we define a function that converts the conversation history content into an optimal format for the OpenAI API:
# Function to construct dialog string from the JSON conversation history
def construct_dialog_for_summary(conversation_history_json):
dialog = ""
for user, messages in conversation_history_json.items():
dialog += f"\n{user}:\n"
for message in messages:
role = message["role"]
content = message["content"]
dialog += f"- {role}: {content}\n"
return dialog
用于构建完整对话历史记录的函数名为:
The function to construct the full conversation history is called:
# Construct the full dialog from the JSON history
formatted_dialog = construct_dialog_for_summary(conversation_history_json)
现在,我们准备完整的消息,用于为系统构建的自定义 GenAISys API 调用,该调用已导入到笔记本中“设置环境”部分的OpenAI子部分:
Now, we prepare the complete message for the custom GenAISys API call built for the system and imported in the OpenAI subsection of the Setting the environment section in our notebook:
# Task to summarize the conversation
mrole = "system"
mcontent = "Your task is to read this JSON formatted text and summarize it."
user_role = "user"
task = f"Read this JSON formatted text and make a very detailed summary of it with a list of actions:\n{formatted_dialog}"
最后,我们调用 GenAISys OpenAI 函数:
Finally, we call the GenAISys OpenAI function:
# The make_openai_api_call function is called
task_response = openai_api.make_openai_api_call(
task, mrole, mcontent, user_role
)
The API response code will be displayed in Markdown format:
from IPython.display import Markdown, display
# Display the task response as Markdown
display(Markdown(task_response))
现在,一切已准备就绪。如果满足以下条件,我们可以调用汇总函数summary==True:
Now, everything is ready. We can call the summarizing function if summary==True:
if summary==True:
# File path to the JSON file
file_path = '/content/conversation_history.json'
# Check if the file exists before calling the function
if os.path.exists(file_path):
summarize_conversation(file_path)
else:
print(f"File '{file_path}' does not exist. Please provide a valid file path.")
请注意,在 Google Colab 中,`/usr/local/ /content/bin` 是默认目录。因此,以下文件路径都指向同一个目录:
Note that in Google Colab, /content/ is the default directory. So, the following file paths point to the same directory:
file_path = ‘/content/conversation_history.json’ or
file_path = ‘conversation_history.json’
在其他环境下,你可能需要绝对路径。
In another environment, you may need absolute paths.
输出结果是对话历史记录的摘要,包含引言和详细摘要。我们可以修改摘要提示,使其长度可长可短。我们还可以设计提示,让生成式 AI 模型针对对话的特定部分,或者为特定项目设计其他任何特定提示。在这种情况下,输出结果令人满意:
The output is a summary of the conversation history that contains an introduction and then a detailed summary. The prompt for this summary can be modified to request shorter or longer lengths. We can also design a prompt asking the generative AI model to target part of the conversation or design any other specific prompt for a given project. In this case, the output is satisfactory:
The JSON formatted text contains interactions between users and an assistant, where users inquire about the capitals of France and Spain and seek recommendations for attractions to visit in these cities. Below is a detailed summary with a list of actions:
User01 Interaction:
1. Question about the Capital of France:
User01 asks for the capital of France.
The assistant responds that the capital of France is Paris.
2. Inquiry about Attractions in Paris:
User01 asks what there is to visit in Paris.
The assistant provides a list of notable attractions in Paris:
1. Eiffel Tower - Iconic symbol and must-visit landmark.
2. Louvre Museum - Largest art museum, home to the Mona Lisa….
跑步通过多种可能的方式通过一系列任务和事件,我们已经见识了GenAISys的灵活性。接下来,让我们运行一个更复杂的多用户会话。
By running through the many possible sequences of tasks and events, we have seen the flexibility that the GenAISys offers us. Let’s run a more complex multi-user session.
在本节中,我们将运行一次技术演示,激活我们在前几章和本章中构建的主要功能:
In this section, we will run a technical session that activates the main functions we have built in the previous chapters and this chapter:
如果您没有中断之前的会话,只需在我们的笔记本中再次运行GenAISys IPython 接口单元格中的“运行接口”部分,即可开始一个新的会话。
If you haven’t interrupted the previous session, then simply run the Running the interface section in the GenAISys IPython interface cell again in our notebook, which will start a new conversation.
如果您是从零开始,那么要开始会话,请打开Event-driven_GenAISys_framework.ipynb并运行以下单元格部分:
If you are starting from scratch, then to start the session, open Event-driven_GenAISys_framework.ipynb and run the following sections of cells:
我们准备探索 GenAISys 的一些高级功能。我们将重点介绍每个提示所激活的事件和功能。本次会话的第一个环节是语义和情感分析。
We are ready to explore some advanced features of the GenAISys. We will highlight the events and functions that are activated by each prompt. The first sequence in the session is semantic and sentiment analysis.
执行我们需要进行语义和情感分析。按照图 4.14所示的 GenAISys 编排的以下序列运行:
To perform semantic and sentiment analysis, we will need to run the following sequence orchestrated by the GenAISys as shown in Figure 4.14:
User01会话开始时默认用户即为当前用户。根据具体使用场景,我们可以将此用户称为“主机”。User01 is the default user at the beginning of a session. We could call this user the “host” if we wish, depending on the use case.User01在第 2 步输入。第 3 步输入触发。已检查代理,这是会话开始时的默认值。User01 enters an input at 2. Input triggering 3. Agent checked, which is checked as the default value when the session starts.Pinecone在提示中查找关键字,在指令场景命名空间中触发 Pinecone 查询,增强提示,并触发4. GenAI 代理。Pinecone keyword in the prompt, triggers a Pinecone query in the instruction scenario namespace, augments the prompt, and triggers 4. GenAI agent.图 4.14:执行语义和语义分析的事件和功能顺序
Figure 4.14: The sequence of events and functions to perform semantic and semantic analysis
触发这一系列函数和事件的提示如下:
The prompt that triggers this sequence of functions and events is as follows:
A customer said that our travel agency was pretty good but should have more activities. Let's ask Pinecone for ideas.
生成式人工智能控制器正确识别Pinecone出触发查询指令场景命名空间的信号,GPT-4o 利用该信号生成了令人满意的响应:
The generative AI controller correctly identified Pinecone as a trigger to query the instruction scenario namespace, which GPT-4o used to produce a satisfactory response:
Agent: genaisys:To enhance your travel agency's offerings based on the customer's feedback, … 3. **Gather Data**: Collect data from various sources such as travel blogs, customer reviews, social media, and travel forums. This data can be used to train or query your semantic search model. 4. **Generate Ideas**: Use the results from the semantic search to generate a list of potential activities. For example, if you're looking to expand adventure activities, the search might suggest zip-lining, rock climbing, or guided hiking tours….
请注意,AI 代理以 `\` 开头来发送响应genaisys,表明使用了正确的命名空间。已进行查询。这是一个可选的质量控制功能,在开发 GenAISys 时建议实施。
Note that the AI agent begins the response with genaisys, signaling that the proper namespace was queried. This is an optional quality control feature that is good practice to implement when developing the GenAISys.
提示 2 的情感图 4.15所示的分析序列与语义分析序列相同,只有两点不同:
The prompt 2 sentiment analysis sequence represented in Figure 4.15 is identical to the semantic analysis sequence with two differences:
User02已被选中。User02 is selected图 4.15:情感分析的任务编排
Figure 4.15: Task orchestration for sentiment analysis
提示信息包含Pinecone用于激活 Pinecone 查询的关键字以及sentiment用于相似性搜索的指示词:
The prompt contains the Pinecone keyword to activate a Pinecone query and the word sentiment, which is an indicator for similarity search:
`A customer said that our travel agency was worse than our competition and should have better service. Let's ask Pinecone what its sentiment is.`
我们可以为所有请求的任务添加一个明确的下拉列表,在某些情况下我们也会这样做。但目前,我们依赖于隐式的关键词搜索。显式(例如,从下拉列表中选择任务)和隐式(使用关键词或依赖提示内容)之间的平衡应该在与用户的研讨会上确定。
We could add an explicit drop-down list for all the tasks requested, and we will in some cases. But in this case, we are relying on implicit keyword searches. The balance between explicit (choosing the task in a drop-down list, for example) and implicit (using a keyword or relying on the content of the prompt) should be decided in workshops with the users.
在这种情况下,选择了Pinecone关键字genaisys命名空间,输出结果令人满意:
In this case, the Pinecone keyword genaisys namespace is selected, and the output is satisfactory:
Agent: genaisys:Sentiment analysis score: 0.2 Explanation: The text expresses a negative sentiment towards the travel agency. The customer explicitly states that the agency is "worse than our competition" and suggests that it "should have better service." These statements indicate dissatisfaction and a negative comparison to competitors, which justifies a low sentiment score.
该疗程持续触发情景记忆提取。
The session continues to trigger episodic memory retrieval.
情景记忆提取序列将在松果索引中搜索个人记忆。团队成员。该记忆与过去一次会议相关,该会议已保存并更新到data01命名空间中的 Pinecone 索引。User03这将触发以下函数和事件序列,如图4.16所示:
The sequence for episodic memory retrieval will search the Pinecone index for a personal memory of the team members. The memory is related to a past meeting that was saved and upserted to the Pinecone index in the data01 namespace. User03 will trigger the following sequence of functions and events, as represented in Figure 4.16:
data01根据RAG提示中的关键字在命名空间中检索RAG 数据。data01 namespace based on the RAG keyword in the prompt.图 4.16:情景记忆提取
Figure 4.16: Episodic memory retrieval
提示显然指的是与团队成员个人经历相关的过去事件。该事件的痕迹是存储在松果索引中的情景记忆:
The prompt clearly refers to a past event that is related to the personal experience of the team. The trace of this event is an episodic memory stored in the Pinecone index:
The CTO was talking about leveraging different kind of memories the other day. What did he mean by that? Let's search RAG.
代理人的答复令人满意:
The agent’s response is satisfactory:
Agent: data01:RAG, or Retrieval-Augmented Generation, is a framework that combines retrieval-based methods with generative models to enhance the capabilities of AI systems. When your CTO mentioned leveraging different kinds of memories, they were likely referring to the various ways in which information can be stored, accessed, and utilized within such a system. Here are some types of "memories" that might be relevant in this context: 1. **Short-term Memory**:
我们可以可以看到,代理正确地找到了命名空间,并且使用了短语your CTO mentioned,认识到这是一个与团队个人经验相关的事件,而不是非个人化的语义数据。
We can see that the agent correctly found the namespace and also uses the phrase your CTO mentioned, recognizing that this is an event linked to the personal experience of the team, not impersonal semantic data.
会议继续进行创意构思。
The session continues with ideation.
会议接下来是构思阶段。我们希望利用生成式人工智能模型 GPT-4o 的强大功能来获取灵感。因此,该序列将不会使用 Pinecone 索引来检索指令或数据,如图4.17所示:
The session continues with ideation. We want to leverage the power of the generative AI model, GPT-4o, in this case, to give us ideas. As such, the sequence will not use the Pinecone index to retrieve instructions or data, as shown in Figure 4.17:
图 4.17:GenAISys 作为创意生成器
Figure 4.17: GenAISys as an ideation generator
提示信息请求人工智能代理提供帮助,以获取一些想法:
The prompt asks the AI agent for help to get some ideas:
But what do you, the AI Agent, suggest we do to leverage these types of memories in our traveling promotion campaigns?
人工智能代理继承了情景记忆,因为它会参考首席技术官在对话历史中的反思,并根据对话历史给出建议:
The AI agent inherits episodic memory since it refers to the CTO’s reflections in the conversation history and now gives its suggestions based on the history of the conversation:
When the CTO mentioned leveraging different kinds of memories, they were likely referring to the concept of using various types of memory systems in artificial intelligence and computing to enhance performance and capabilities.…
…consider the following strategies: 1. **Personalized Recommendations**: Use short-term and long-term memory to analyze customer preferences and past interactions to provide personalized travel recommendations and offers. 2. **Dynamic Content**: Utilize episodic memory to tailor marketing content based on past customer interactions and experiences, making the promotions more relevant and engaging. …
GenAISys为团队提供了许多思路。现在,团队希望仔细考虑这些思路。
The GenAISys has provided ideas for the team. Now, the team wants to think these ideas over.
团队现在他们已经进行了一次简短的对话,本来可以根据需要继续下去。User01接下来,他们接管并直接与团队沟通。GenAISys 现在被用作远程协作会议工具,如图4.18所示:
The team has now had a short conversation that could have continued as long as they needed to. User01 takes over and communicates directly with the team. The GenAISys is now used as a collaborative remote meeting tool, as shown in Figure 4.18:
User01正在介入。User01 is stepping in.User01取消选中Agent小部件。User01 unchecks the Agent widget.User01结束对话,对话内容被保存。User01 ends the conversation, which is saved.图 4.18:没有 AI 代理的对话
Figure 4.18: A dialogue without an AI agent
User01 enters a message for the others:
OK. Let's stop here, get a summary, and go see the manager to get some green lights to move ahead.
图 4.19显示,User01已取消选中 AI 代理发送消息,现在准备通过输入来结束会话exit。
Figure 4.19 shows that User01 has unchecked the AI agent to send the message and is now ready to end the session by entering exit.
图 4.19:用户结束对话
Figure 4.19: The user ends the conversation
GenAISys 会显示对话结束消息,如图4.20所示。
The GenAISys displays the conversation ended message, as shown in Figure 4.20.
图 4.20:对话结束
Figure 4.20: Conversation ends
小贴士:需要查看此图片的高分辨率版本吗?请在新一代 Packt 阅读器中打开本书,或在 PDF/ePub 版本中查看。
Quick tip: Need to see a high-resolution version of this image? Open this book in the next-gen Packt Reader or view it in the PDF/ePub copy.
购买本书即可免费获赠新一代 Packt Reader 阅读器。扫描二维码或访问packtpub.com/unlock,然后使用搜索栏按书名查找本书。请仔细核对显示的版本,确保您购买的是正确的版本。
The next-gen Packt Reader is included for free with the purchase of this book. Scan the QR code OR go to packtpub.com/unlock, then use the search bar to find this book by name. Double-check the edition shown to make sure you get the right one.
该消息指示用户继续到下一个单元格以显示和总结对话。
The message instructs the users to proceed to the next cell to display and summarize the conversation.
显示屏以及对一段对话的总结将整合到各项功能中第 5 章“GenAISys 框架:添加多模态、多功能推理与思维链”中的内容。
The display and summarization of a conversation will be integrated into the functions of the GenAISys framework in Chapter 5, Adding Multimodal, Multifunctional Reasoning with Chain of Thought.
在本笔记本中,我们将按照“与两个用户的会话”部分中的描述继续处理下一个单元格。
In this notebook, we will proceed to the next cells as described in the A session with two users section.
显示函数的输出结果为对话的 Markdown 文本:
The output of the display function provides Markdown text of the conversation:
…assistant
content:
When the CTO mentioned leveraging different kinds of memories, they were likely referring to the concept of…
Episodic Memory: This involves storing information about specific events or experiences. In AI, episodic memory can be used to recall past interactions or events to inform future decisions…
…To leverage these types of memories in your travel promotion campaigns, consider the following strategies:
Personalized Recommendations: Use short-term and long-term memory to analyze customer preferences and past interactions to provide personalized travel recommendations and offers.
Dynamic Content: Utilize episodic memory to tailor marketing content based on past customer interactions and experiences, making the promotions more relevant and engaging….
The summary is interesting because it provides useful suggestions for this online travel agency:
AI Suggestion for Travel Promotion:
1.Personalized Recommendations: Use short-term and long-term memory for personalized travel offers.
2.Dynamic Content: Utilize episodic memory for tailored marketing content.
3.Knowledge-Based Insights: Leverage semantic memory for travel tips and destination information.
4.Real-Time Engagement: Use working memory for real-time customer interactions.
5.Feedback and Improvement: Implement long-term memory systems to analyze feedback and improve campaigns.
我们构建了GenAISys框架的基础架构,接下来的章节将对其进行完善。我们也进行了一些基础性的讨论。现在让我们总结一下本章内容,并进入下一阶段。
We built the fundamental structure of the GenAISys framework we will be enhancing throughout the next chapters. We also ran some basic conversations. Let’s summarize this chapter and move up to the next level.
复杂、事件驱动、瞬息万变的经济环境需要强大的自动化系统来处理由即时消费者需求产生的数百项任务。GenAISys 凭借其响应式界面和生成式人工智能功能,能够满足这些需求。真正的挑战在于如何提供一个动态且直观的系统。无论生成式人工智能如何实现任务自动化——而自动化程度确实可以非常高——最终的决策仍然由人做出。人们需要在会议中进行沟通,无论会议是以线下形式还是线上形式进行。因此,挑战在于如何为组织提供多用户 GenAISys 系统。
A complex, event-driven, fast-moving economy requires powerful automation for the hundreds of tasks generated by just-in-time consumer needs. A GenAISys can satisfy those requirements with a responsive interface and generative AI capabilities. The challenge is providing a dynamic, intuitive system. No matter how generative AI automates tasks—and they can be tremendously automated—the final decisions will be made by humans. Humans need to communicate in meetings, whether they are organized physically or online. The challenge then evolves to provide an organization with multi-user GenAISys.
本章首先探讨了一个用于构建多用户、多轮次、多功能和红黄绿(RAG)特性的高级框架。该框架包含实时记忆功能和存储在向量存储中的长期知识。整个类似 ChatGPT 的系统需要一个响应界面和对话代理,我们将在后续章节中对其进行完善。
In this chapter, we first explored a high-level framework to build multi-user, multi-turn, multifunctional, and RAG features. The framework includes real-time memory features and long-term knowledge stored in a vector store. The overall ChatGPT-like system requires a response interface and conversational agent that we will enhance in the following chapters.
然后,我们使用 IPython 构建了一个事件驱动的 GenAISys 响应接口。该接口对最终用户来说非常友好,用户可以通过三个控件轻松使用系统。第一个控件管理用户输入,第二个控件显示当前用户,第三个控件是一个代理复选框,用于激活或停用基于 GPT-4o 构建的 AI 对话代理。
We then build an event-driven GenAISys response interface with IPython. The interface was seamless for an end user who can use the system with three widgets. The first widget managed the users’ input, the second one the active user, and the third an agent checkbox to activate or deactivate the AI conversational agent built with GPT-4o.
最后,我们为一家在线旅行社团队运行了一场以旅行为主题的多用户、多轮次GenAISys会话。首要目标是为用户提供流畅的GenAISys体验,并包含三个组件。第二个目标是探索短期记忆、长期记忆、语义记忆和情景记忆的功能范围。第三个目标是运行RAG算法来检索指令和数据。最终目标是让用户能够与AI代理进行交互,无论是否使用AI代理。我们通过保存和总结会话结果来结束本次会话。
Finally, we ran a multi-user, multi-turn GenAISys session centered on traveling for an online travel agency team. The first goal was to run a seamless GenAISys for the users with three widgets. The second goal was to explore the scope of short-term, long-term, semantic, and episodic memory. The third goal was to run RAG to retrieve instructions and data. Finally, the goal was to let the users communicate with or without the AI agent. We concluded the session by saving and summarizing it.
现在我们有了一个框架,我们可以在接下来的章节中对其进行配置和增强,首先在第 5 章“使用思维链添加多模态、多功能推理”中向 GenAISys 添加多模态功能和外部扩展。
We now have a framework that we can configure and enhance in the following chapters, starting by adding multimodal functions and external extensions to the GenAISys in Chapter 5, Adding Multimodal, Multifunctional Reasoning with Chain of Thought.
新框架、演进的架构、研究成果发布、生产故障——AI_Distilled每周都会为从事 LLM 和 GenAI 系统实战的工程师和研究人员提供一份简报,帮您过滤掉各种干扰信息。立即订阅,即可免费获得电子书,以及每周更新的深度见解,助您保持专注,掌握最新资讯。
New frameworks, evolving architectures, research drops, production breakdowns—AI_Distilled filters the noise into a weekly briefing for engineers and researchers working hands-on with LLMs and GenAI systems. Subscribe now and receive a free eBook, along with weekly insights that help you stay focused and informed.
订阅请访问https://packt.link/TRO5B或扫描下方二维码。
Subscribe at https://packt.link/TRO5B or scan the QR code below.
在我们的开发历程中,我们已经构建了 GenAISys 的核心框架。我们拥有一个响应迅速、小规模、类似 ChatGPT 的交互式界面。我们超越了传统的一对一辅助驾驶交互模式,创建了一个协作式多用户环境,其中 AI 代理可以积极参与讨论。我们通过集成 RAG 进一步扩展了这种以人为本的设计,使我们的 AI 代理能够访问 Pinecone 索引,从而管理指令场景和数据。最后,我们构建了一个灵活的 GenAISys,允许用户在协作会议期间激活或停用 AI 代理。简而言之,我们创建了一个以人为本的 AI 系统,旨在增强人类团队的能力,而不是试图用机器智能取代人。
At this point in our journey, we’ve built the core framework of our GenAISys. We have a responsive, small-scale, ChatGPT-like interactive interface. We expanded beyond typical one-to-one copilot interactions, creating a collaborative multi-user environment where an AI agent actively participates in discussions. We further extended this human-centric design by integrating RAG, giving our AI agent access to a Pinecone index capable of managing both instruction scenarios and data. Finally, we built a flexible GenAISys that allows users to activate or deactivate the AI agent during collaborative meetings. In short, we have created a human-centric AI system that augments human teams rather than attempting to replace people with machine intelligence.
然而,尽管全球跨洲供应链呈指数级增长,且商品、服务和数字内容的日常流动量巨大,但自动化程度仍然很高,这需要高度自动化。例如,我们不能指望像 Meta、X 或 LinkedIn 这样的社交媒体平台每天雇用数百万人来审核数十亿条信息(包括图像、音频和视频文件)。同样,像亚马逊这样的公司也无法仅靠人工来管理数百万笔在线交易和实体配送。自动化对于增强人类的决策和推理能力至关重要,尤其是在大规模处理关键任务时。因此,在本章中,我们将通过添加多模态功能和推理功能来扩展 GenAISys 框架。为了应对跨领域自动化的挑战,我们将实现图像生成和分析,并开始集成机器学习。我们的目标是在 GenAISys 中构建一个新的智能体 AI 层。
However, despite its human-centric nature, the exponential growth of global transcontinental supply chains and the vast daily flow of goods, services, and digital content require significant levels of automation. For example, we cannot realistically expect social media platforms such as Meta, X, or LinkedIn to employ millions of people to moderate billions of messages—including images, audio, and video files—every day. Similarly, companies such as Amazon cannot manage millions of online transactions and physical deliveries exclusively through human efforts. Automation is essential to augment human decision-making and reasoning, particularly for critical tasks at scale. Therefore, in this chapter, we will extend the GenAISys framework by adding multimodal capabilities and reasoning functionalities. To address the challenges of cross-domain automation, we will implement image generation and analysis and begin integrating machine learning. Our objective is to build a new agentic AI layer into our GenAISys.
我们将首先概述正在集成到现有 GenAISys 框架中的各项功能。鉴于 GenAISys 的应用范围不断扩大,我们将引入思维链(CoT)推理过程。为了有效地协调和管理复杂任务,我们将集成计算机视觉功能。这包括使用 DALL-E 构建图像生成功能,以及使用 GPT-4o 构建图像分析功能。接下来,我们将为偏好语音交互的用户添加音频功能——使用语音转文本( STT ) 进行输入提示,使用文本转语音( TTS ) 进行回复。最后,我们将在 GenAISys 中引入决策树分类器作为机器学习端点,使其能够预测活动。在本章结束时,我们将成功地将 GenAISys 扩展为一个完全交互式的多模态推理平台,能够应对复杂的跨领域用例。
We will begin by outlining features that we are integrating into our existing GenAISys framework. Given the broadening scope of our GenAISys, we will introduce chain-of-thought (CoT) reasoning processes to orchestrate and manage complex tasks effectively. We will then incorporate computer vision capabilities. This includes building an image generation function with DALL-E and an image analysis function using GPT-4o. Next, we will add audio functionality for those who prefer voice interactions—using speech to text (STT) for input prompts and text to speech (TTS) for responses. Lastly, we’ll introduce a decision tree classifier as a machine learning endpoint within the GenAISys, capable of predicting activities. By the end of this chapter, we will have successfully extended the GenAISys into a fully interactive, multimodal reasoning platform ready to tackle complex cross-domain use cases.
本章主要涵盖以下主题:
In all, this chapter covers the following topics:
让我们首先为 GenAISys 设计一个增强型界面,使其具备额外的 AI 功能。
Let’s begin by designing an enhanced interface for our GenAISys with additional AI capabilities.
到目前为止,我们开发的 GenAISys 框架是事件驱动的,由用户输入(人为或系统生成)激活,从而触发特定的 AI 代理功能。在本章中,我们将通过添加以下几个新功能来扩展 GenAISys:
So far, the GenAISys framework we’ve developed is event-driven, activated by user inputs (human- or system-generated) that trigger specific AI agent functions. In this chapter, we’ll expand the GenAISys by adding several new capabilities:
我们首先来看一下图 5.1所示的扩展 GenAISys 架构:
Let’s start by examining the expanded GenAISys architecture shown in Figure 5.1:
图 5.1:增强型 GenAISys 接口的架构
Figure 5.1: Architecture of the enhanced GenAISys interface
这图(是上一章图 4.1的扩展版本)重点介绍了我们将集成到 GenAISys 中的新功能:
This figure (which is an extended version of Figure 4.1 from the previous chapter) highlights the new capabilities we’ll integrate into our GenAISys:
提醒
特意选择不使用箭头来展示 GenAISys 架构的主要组件,旨在传达其核心概念:模块化和架构灵活性。该图并非僵化的蓝图,而是一个概念工具包。它以独立且可互操作的模块形式,向您展示了可用的强大组件——I1. AI 控制器、I2. 多用户聊天机器人、F1. 生成式 AI 模型、F2. 记忆保持、F3. 模块化 RAG以及F4. 多功能。
Reminder
The decision to present the main components of the GenAISys architecture without arrows is a deliberate choice designed to convey a core concept: modularity and architectural flexibility. The figure is not a rigid blueprint but rather a conceptual toolkit. It shows you the powerful components at your disposal—I1. AI controller, I2. Multi-user chatbot, F1. Generative AI model, F2. Memory retention, F3. Modular RAG, and F4. Multifunctional capabilities—as independent, interoperable blocks.
我们正在扩展第四章中构建的 GenAISys 的功能,方法是添加新的层,而不是替换现有组件。我们重点在于增强和无缝集成。下图提供了一个高级流程图,展示了新增功能如何实现。将集成到我们现有的 GenAISys 架构中:
We are expanding the functionality of GenAISys as built in Chapter 4 by adding new layers rather than replacing existing components. Our emphasis here is on enhancement and seamless integration. The following figure provides a high-level flowchart demonstrating how the additional capabilities will integrate into our existing GenAISys architecture:
图 5.2:GenAISys 附加功能的流程图
Figure 5.2: Flowchart of additional functions to the GenAISys
以下附加功能将集成到我们现有的 GenAISys 界面中:
The following additional functions will be integrated into our existing GenAISys interface:
VBox界面,在 AI 代理使用 CoT 逻辑时清晰地显示推理步骤。VBox interface, displaying reasoning steps clearly whenever the AI agent utilizes CoT logicTo achieve this expanded functionality, we will develop the following key features:
尽管我们新增了包括推理功能(CoT)在内的多个功能,但为了最大程度地降低本章的复杂性,我们只会引入一个新的软件包:gTTS。我们的主要目标仍然是构建一个可靠的架构,并优化依赖关系管理。首先,让我们来了解一下IPython接口的更新内容以及AI代理的增强功能。
Although we are adding several new functions, including reasoning functionality (CoT), we will introduce only a single new package installation, gTTS, to minimize complexity in this chapter. Our primary focus remains on building a reliable architecture with optimal dependency management. To begin, let’s explore the updated elements of the IPython interface and the enhancements to the AI agent.
我们开发的 GenAISys 架构现在可以看作是由三个相互连接的层组成,如图 5.3所示。这些增强功能模糊了编排、控制和代理功能之间的界限,因为这些角色现在分布在多个层中:
The GenAISys architecture we’ve developed can now be viewed as comprising three interconnected layers, as shown in Figure 5.3. These enhancements blur the lines between orchestration, control, and agent functionality, as these roles are now distributed across multiple layers:
图 5.3:事件驱动型 GenAISys 的三层结构
Figure 5.3: The three layers of the event-driven GenAISys
这种高层架构集成了编排器、控制器和代理,每个组件又细分为特定的 Python 功能。让我们首先从功能角度来探讨第一层,即 IPython 接口。
This high-level architecture integrates orchestrators, controllers, and agents, each broken down into specific Python functionalities. Let’s start by exploring Layer 1, the IPython interface, from a functional standpoint.
IPython接口现在新增了三项功能(图 5.4中以黄色高亮显示):语音控件、文件处理控件以及由用户输入和 AI 代理活动触发的专用推理界面。这些增强功能使界面上的交互式控件和功能总数达到六个。
The IPython interface now incorporates three new features (highlighted in yellow in Figure 5.4): a voice widget, a file-handling widget, and a dedicated reasoning interface triggered by user inputs and AI agent activities. These enhancements bring the interface total to six interactive widgets and functions.
图 5.4:IPython 接口新增语音、文件和推理功能
Figure 5.4: Voice, file, and reasoning features are added to the IPython interface
让我们逐一了解每个组件和功能:
Let’s go through each widget and function:
chat_with_gpt)。chat_with_gpt).对于 TTS 服务,使用 gTTS 服务,并通过一个False默认设置为启用的复选框进行控制:
For TTS, the gTTS service is utilized and controlled via a checkbox set to False by default:
# Create a checkbox to toggle text-to-speech
tts_checkbox = Checkbox(
value=False,
description='Voice Output',
layout=Layout(width='20%')
)
如果AI代理的复选框被选中,则会调用TTS函数:
If the AI agent’s checkbox is checked, then the TTS function is called:
if agent_checkbox.value:
…
if tts_checkbox.value:
text_to_speech(response)
生成的 MP3 文件(response.mp3)会在函数中自动播放update_display():
The resulting MP3 file (response.mp3) is automatically played in the update_display() function:
def update_display():
…
#Audio display
if os.path.exists("/content/response.mp3"):
display(Audio("/content/response.mp3", autoplay=True))
!rm /content/response.mp3
chat_with_gpt。它通过另一个复选框进行控制,初始设置为False:
# Create a checkbox to toggle agent response
files_checkbox = Checkbox(
value=False,
description='Files',
layout=Layout(width='20%')
)
chat_with_gpt. It is controlled via another checkbox, initially set to False:
# Create a checkbox to toggle agent response
files_checkbox = Checkbox(
value=False,
description='Files',
layout=Layout(width='20%')
)
如果一张图片如果存在,则使用Python 图像库( PIL ) 在函数中显示update_display():
If an image exists, it is displayed with the Python Image Library (PIL) in the update_display() function:
if os.path.exists("/content/c_image.png") and files_checkbox.value==True:
# Open the image using PIL
original_image = PILImage.open("/content/c_image.png")
# Resize the image to 50% of its original size
new_size = (original_image.width //2, original_image.height//2)
resized_image = original_image.resize(new_size)
# Display the resized image
display(resized_image)
# Create an output widget for reasoning steps
reasoning_output = Output(
layout=Layout(border="1px solid black", padding="10px",
margin="10px", width="100%")
)
# Create an output widget for reasoning steps
reasoning_output = Output(
layout=Layout(border="1px solid black", padding="10px",
margin="10px", width="100%")
)
该组件将接收来自 CoT 进程的输出,并将其独立VBox于函数持续显示update_display():
The widget will receive outputs from the CoT process and display them independently from VBox and persistently in the update_display() function:
def update_display():
…
# Display reasoning_output persistently
display(reasoning_output)…
界面VBox现在包含所有交互式控件,包括新添加的 TTS 和文件控件:
The VBox interface now contains all interactive widgets, including the newly added TTS and files widgets:
if conversation_active:
display(
VBox(
[user_selector, input_box, agent_checkbox,
tts_checkbox, files_checkbox],
layout=Layout(display='flex', flex_flow='column',
align_items='flex-start', width='100%')
)
)
鉴于考虑到人工智能代理的响应长度和复杂性(尤其是在 CoT 流程中),我们引入了使用 Markdown 的增强型格式化功能。该update_display()功能现在能够清晰地格式化条目,并调用专门的格式化函数:
Given the length and complexity of responses from the AI agent (especially during CoT processes), we introduced an enhanced formatting feature using Markdown. The update_display() function now formats entries clearly, calling a dedicated formatting function:
def update_display():
clear_output(wait=True)
for entry in user_histories[active_user]:
formatted_entry = format_entry(entry)
display(Markdown(formatted_entry))
该format_entry(entry)功能会对用户(蓝色)和助手(绿色)的回复进行格式化,以确保其可读性:
The format_entry(entry) function formats the user’s (blue) and assistant’s (green) responses, ensuring readability:
def format_entry(entry):
"""Format the content of an entry for Markdown display."""
if entry['role'] == 'user':
formatted_content = format_json_as_markdown(entry['content'])
if isinstance(entry['content'], (dict, list))
else entry['content']
formatted_content = formatted_content.replace("\n", "<br>") # Process newlines outside the f-string
return f"**<span style='color: blue;'>{active_user}:</span>** {formatted_content}"
…
elif entry['role'] == 'assistant':
formatted_content = format_json_as_markdown(entry['content'])
…
return f"**<span style='color: green;'>Agent:</span>** {formatted_content}"
这种设计强调 IPython 接口(第一层)仅用于协调用户交互并触发底层函数和代理。这种架构确保您拥有如果您想在不使用用户界面的情况下直接调用函数和代理,则需要这种灵活性。
This design emphasizes that the IPython interface (Layer 1) is purely to orchestrate user interactions and trigger underlying layers of functions and agents. This architecture ensures that you have the flexibility you need if you want to call the functions and agents directly without a user interface.
既然我们已经了解了 IPython 接口,那么让我们来探索第二层(AI 代理)的增强功能。
With the IPython interface described, let’s explore the enhanced capabilities in Layer 2, the AI agent.
这第一层中通过 IPython 接口调用的 AI 代理仍然保留了其chat_with_gpt功能,这强化了 GenAISys 的对话特性。随着推理能力的引入,AI 代理之间现在也可以直接进行对话。
The AI agent invoked by the IPython interface in Layer 1 remains the chat_with_gpt function, reinforcing the conversational nature of GenAISys. With the introduction of reasoning capabilities, the conversation can now occur directly between AI agents as well.
该chat_with_gpt功能已扩展,新增了多项特性。如有必要,请参阅第 4 章中描述的核心功能。
The chat_with_gpt function has been expanded with several new features. If necessary, review the core functionalities described in Chapter 4.
让我们一起来了解一下人工智能代理新增的增强功能:
Let’s explore the new enhancements added to the AI agent:
continue_functions=True在函数开头引入了限制,以确保一次只执行一个请求的任务。continue_functions=True has been introduced at the beginning of the function to ensure that only one requested task is executed at a time.continue_functions在 Pinecone 查询过程结束时,如果用户消息中False存在关键字,则会触发此操作。这将阻止任何其他意外任务的执行。Pineconecontinue_functions is set to False at the end of the Pinecone query process, triggered by the presence of the Pinecone keyword in the user message. This stops any additional unintended task executions.reason.chain_of_thought_reasoning稍后在“使用 CoT 进行推理”部分中将介绍的新功能,会在特定条件下调用:
if "Use reasoning" in user_message and "customer" in user_message and "activities" in user_message and continue_functions==True:
reason.chain_of_thought_reasoning, described later, in the Reasoning with CoT section, is called under specific conditions:
if "Use reasoning" in user_message and "customer" in user_message and "activities" in user_message and continue_functions==True:
该continue_functions==True条件确保推理函数会随初始用户查询一起调用。此外,该过程还会下载一个示例客户活动文件:
The continue_functions==True condition ensures the reasoning function is called with the initial user query. A sample customer activities file is also downloaded as part of this process:
initial_query = user_message
download("Chapter05","customer_activities.csv")
reasoning_steps = reason.chain_of_thought_reasoning(initial_query)
在本章的示例用例中,团队可以自动访问和查询定期更新的客户活动数据源。提供的示例文件包含 10,000 条历史客户活动记录,包括客户 ID、位置、活动类型和活动评分:
In the example use case for this chapter, a team can automatically access and query a regularly updated customer activity data source. The sample file provided contains 10,000 records of historical customer activities, including customer IDs, locations, activity types, and activity ratings:
图 5.5:历史遗址的顾客评分
Figure 5.5: The customer ratings of historical sites
一个决策树分类器随后在CoT推理函数中利用此数据集来预测最热门的客户活动。生成响应后,将其添加到输出中,并continue设置为False:
A decision tree classifier later utilizes this dataset within the CoT reasoning function to predict the most popular customer activity. Once the response is generated, it is added to the output, and continue is set to False:
aug_output=reasoning_steps
continue_functions=False
reason.generate_image部分实现的新函数也已集成到系统中。它的调用方式如下:
prompt = user_message
image_url = reason.generate_image(prompt, model="dall-e-3",
size="1024x1024", quality="standard", n=1)
reason.generate_image, that we will implement in the Image generation and analysis section has also been integrated. It is called as follows:
prompt = user_message
image_url = reason.generate_image(prompt, model="dall-e-3",
size="1024x1024", quality="standard", n=1)
系统会返回生成的图像 URL,并将图像本身下载并保存到本地,以便显示或进一步处理:
The generated image URL is returned, and the image itself is downloaded and saved locally for display or further processing:
# Save the image locally
save_path = "c_image.png"
image_data = requests.get(image_url).content
with open(save_path, "wb") as file:
file.write(image_data)
输出中添加相应的消息,并将continue标志设置为False:
A corresponding message is added to the output, and the continue flag is set to False:
aug_output="Image created"
continue_functions=False
openai_api.make_openai_api_call现已更名。它保留了与第 4 章reason.make_openai_api_call中相同的功能,但现在是 GenAISys 推理库的一部分。内存管理条件(考虑完整的用户历史记录或仅考虑当前用户消息)已更新,添加了显式条件,用于检查 `<function_name>` 的状态和标志:
if user_memory…elseuser_memorycontinue_functionsif user_memory==False and continue_functions==True:
…
if user_memory==True and continue_functions==True: …
openai_api.make_openai_api_call is now renamed reason.make_openai_api_call. It maintains the same functionality as in Chapter 4 but is now part of the GenAISys reasoning library. The memory management if user_memory…else condition, which takes the complete user history or just the present user message into account, has been updated with explicit conditions that check both the state of user_memory and the continue_functions flag:
if user_memory==False and continue_functions==True:
…
if user_memory==True and continue_functions==True: …
因此,人工智能代理它充当中间协调器,调用并管理底层函数的执行,而不是直接执行它们。Pinecone 接口仍然是顶层,它调用 AI 代理,而 AI 代理又与第三层中的特定函数进行交互。
The AI agent thus acts as an intermediate orchestrator, calling and managing the execution of lower-layer functions rather than executing them directly. The Pinecone interface remains the top layer that invokes the AI agent, which in turn interacts with the specific functions within Layer 3.
在这一层,我们的本章重点介绍为实现CoT认知代理的高级推理而引入的新功能。Pinecone索引和标准OpenAI调用仍沿用第4章的实现方式。本章的主要新增内容如下:
In this layer, our focus is on the new functionalities introduced to enable advanced reasoning through the CoT cognitive agent. Pinecone indexing and standard OpenAI calls remain as implemented in Chapter 4. The primary additions in this chapter are as follows:
我们将在本章接下来的章节中探讨这些功能,具体如下:
We will explore these functionalities in the upcoming sections of this chapter, as follows:
在本章结束时,我们增强的三层 GenAISys 将具备全新且强大的功能,这些功能旨在为后续章节的进一步扩展做好准备。现在,让我们深入了解这些增强功能,首先从环境设置开始。
By the end of this chapter, our enhanced three-layer GenAISys will have new, robust capabilities designed to expand even further in subsequent chapters. Let’s now dive deeper into these enhancements, beginning with the environment setup.
在本节中,我们我们将对之前构建的环境进行增强、扩展和重新调整,以最终确定 GenAISys 框架。这些更改对于后续章节中的用例至关重要。请Multimodal_reasoning_with_Chain_of_Thought.ipynb在 GitHub 上的 Chapter05 目录中打开 notebook(https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main)。
In this section, we will enhance, expand, and rearrange the environment previously built to finalize the GenAISys framework. These changes are essential for the upcoming use cases in subsequent chapters. Open the Multimodal_reasoning_with_Chain_of_Thought.ipynb notebook within the Chapter05 directory on GitHub (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main).
关于在软件包安装方面,笔记本中的“设置环境”部分与上一章基本相同(Event-driven_GenAISys_framework.ipynb),仅新增了一项:Google 文本转语音(gTTS)。
Regarding package installations, the Setting up the environment section in the notebook remains largely unchanged from the previous chapter (Event-driven_GenAISys_framework.ipynb), with just one new addition: Google Text-to-Speech (gTTS).
然而,为了支持CoT生成式AI推理功能,我们进行了一些重要的更新。让我们逐一了解这些更新,首先从OpenAI部分开始。
However, several significant updates have been made to support the CoT generative AI reasoning features. Let’s examine each of these updates, starting with the OpenAI section.
第一个我们下载的两个文件与前几章相同。但是,第三个和第四个文件是新增的,是为了支持高级功能而添加的:
The first two files we download remain the same as in previous chapters. The third and fourth files, however, are new and have been added to support advanced functionality:
from grequests import download
download("commons","requirements01.py")
download("commons","openai_setup.py")
download("commons","reason.py")
download("commons","machine_learning.py")
reason.py现在,生成式人工智能库包含了前几章中构建的函数以及本章新增的函数。这些函数及其状态如下:
reason.py now contains the generative AI library with the functions built in the previous chapters and the ones we are adding in this chapter. These functions in the generative AI library and their status are as follows:
make_openai_api_call(input, mrole,mcontent,user_role)这是一个通用的 OpenAI API 调用,在第 1 章“设置环境”部分中有详细描述。现在,它的导入方式如下:
from reason import make_openai_api_call
make_openai_api_call(input, mrole,mcontent,user_role) is a general-purpose OpenAI API call described in the Setting up the environment section of Chapter 1. It is now imported as follows:
from reason import make_openai_api_call
image_analysis图像分析功能可以描述图像,也可以将图像作为起点生成内容,例如故事。此功能将在本章的“图像生成与分析”部分进行详细介绍。image_analysis is the image analysis function that can describe an image or use the image as a starting point to generate content such as a story. This function is described in the Image generation and analysis section of this chapter.generate_image是一个使用 DALL-E 生成图像的新功能,详见本章的图像生成和分析部分。generate_image is a new function that generates images with DALL-E, detailed in the Image generation and analysis section of this chapter.chain_of_thought_reasoning是我们正在构建的 GenAISys 中的一个新的 CoT 逻辑函数。我们将在本章的“使用 CoT 进行推理”machine_learning部分实现它。它可以调用其他库中的函数,例如。chain_of_thought_reasoning is a new CoT logic function of the GenAISys we are building. We will implement it in the Reasoning with CoT section of this chapter. It can call functions from other libraries, such as machine_learning.machine_learning.py现在将包含一个名为 `.` 的函数,其中包含一个决策树分类器ml_agent。该函数接受两个参数:
machine_learning.py will now contain a decision tree classifier in a function named ml_agent. The function takes two arguments:
ml_agent(ml_agent(feature1_value, feature2_column)
在我们的示例用例中,feature1_value代表客户位置,feature2_column代表客户活动。ml_agent分类器将基于历史数据预测特定位置最常见的客户活动。
In our example use case, feature1_value will represent a customer location, and feature2_column will represent customer activities. The ml_agent classifier will predict the most popular customer activity for a specific location based on historical data.
我们ml_agent从machine_learning.py以下渠道进口:
We import ml_agent from machine_learning.py as follows:
# Import the function from a custom machine learning file
import os
import machine_learning
from machine_learning import ml_agent
其余OpenAI 的设置小节,包括软件包安装和 API 密钥初始化,与前几章相同。现在让我们初始化新功能。
The remaining OpenAI setup subsections, including package installation and API key initialization, remain identical to previous chapters. Let’s now initialize our new functionalities.
We will initialize the following new functions:
!pip install gTTS==2.5.4,pypi 是一个开源、免费的 TTS 库,适合原型开发:https://pypi.org/project/gTTS/。gTTS`click`需要命令行库 pypi。第一个单元格通过设置 pypi 来检查是否要使用use_gttsgTTS True:
use_gtts = True #activates Google TTS in Google Colab if True and deactivates if False
!pip install gTTS==2.5.4, which is an open source, free TTS library that fits prototyping purposes: https://pypi.org/project/gTTS/. `click`, a command-line library, is required for gTTS. The first cell checks if we wish to use gTTS by setting use_gtts to True:
use_gtts = True #activates Google TTS in Google Colab if True and deactivates if False
笔记本的第二个单元格会检查并设置正确的`click`版本(如果use_gtts已设置)True。如果需要更新,笔记本输出中会显示一条清晰的消息,提示您手动重启运行时。重启后,只需单击`Run All`即可继续。如果版本已更新,代码将显示一条 HTML 消息,提示您重启:
The second cell of the notebook will check for and set up the correct `click` version if use_gtts is set to True. If an update is needed, it will then display a clear message in the notebook output prompting you to manually restart the runtime. After restarting, simply click `Run All` to continue. The code will display an HTML message to restart if the version is updated:
import importlib.metadata
from IPython.display import display, HTML # Required for the message
# ... (define required_click_version, current_click_version, and html_message as in your code) ...
if current_click_version != required_click_version:
# --- Commands to uninstall and install ‘click’ would go here ---
# Example: !pip uninstall -y click
# Example: !pip install click==8.1.8
# Display the styled message prompting for manual restart
display(HTML(html_message))
# Stop the Python cell execution gracefully, prompting restart
raise SystemExit(“Please restart the Colab runtime to apply changes.”)
else:
print(f”--- ‘click’ is already at the correct version ({required_click_version}). No action needed. ---”)
如果use_gtts设置为True,则安装 gTTS 并定义 TTS 转换函数:
If use_gtts is set to True, we install gTTS and define a TTS conversion function:
# use_gtts activates Google TTS in Google Colab if True and deactivates if False
if use_gtts:
!pip install gTTS==2.5.4
from gtts import gTTS
from IPython.display import Audio
def text_to_speech(text):
# Convert text to speech and save as an MP3 file
if use_gtts:
if not isinstance(text, str):
text = str(text) # Making sure the text is a string not a list
tts = gTTS(text)
tts.save(“response.mp3”)
如前文“第 1 层:IPython 接口”部分所述,当 AI 代理返回响应时,此功能将在 IPython 接口中激活。
This function will be activated in the IPython interface when the AI agent returns a response, as explained earlier in the Layer 1: IPython interface section.
machine_learning.py:
# Import the function from the custom OpenAI API file
import os
import machine_learning
from machine_learning import ml_agent
machine_learning.py:
# Import the function from the custom OpenAI API file
import os
import machine_learning
from machine_learning import ml_agent
该决策树分类器功能将基于历史数据预测热门客户活动,从而增强我们的 GenAISys 的预测能力。
This decision tree classifier function will predict popular customer activities based on historical data, enhancing our GenAISys’s predictive capabilities.
reason.py:
# Import the function from the custom OpenAI API file
import os
import reason
from reason import chain_of_thought_reasoning
reason.py:
# Import the function from the custom OpenAI API file
import os
import reason
from reason import chain_of_thought_reasoning
Pinecone 的安装、初始化和查询定义将在第 3 章和第 4 章中进行说明。如有需要,请花些时间回顾这两章,因为我们将重用之前开发的函数。现在,我们已准备好构建图像生成和分析函数。
The Pinecone installation, initialization, and queries are then defined as explained in Chapters 3 and 4. Take some time to revisit those chapters if needed, as we will reuse the functions previously developed. We’re now prepared to build the image generation and analysis functions.
在本节中,我们将首先,我们将使用 OpenAI 的 DALL-E 模型创建一个灵活的图像生成函数。接下来,我们将构建一个图像分析函数。目标是:如图 5.6所示,在保持 GenAISys 响应迅速、事件驱动功能的同时,增强其计算机视觉功能:
In this section, we will begin by creating a flexible image generation function using OpenAI’s DALL-E model. Following that, we’ll build a function for image analysis. The objective is to enhance GenAISys with computer vision capabilities while preserving its responsive, event-driven functionality, as illustrated in Figure 5.6:
图 5.6:利用灵活的事件驱动触发器生成图像
Figure 5.6: Generating images with flexible event-driven triggers
前文图示架构是我们最初在第四章中开发的架构的演进版本。它新增了多项功能:语音(语音)特征激活、图像文件管理、增强的显示功能以及基于CoT的推理。本节将重点介绍如何将计算机视觉功能与增强的显示功能集成并进行演示。
The preceding figure is an evolution of the architecture we first developed in Chapter 4. It has been augmented to include new capabilities: activation of speech (voice) features, management of image files, enhanced display functionality, and reasoning through CoT. In this section, our focus will specifically be on integrating and demonstrating computer vision capabilities alongside the enhanced display functionality.
The image generation and analysis processes are designed to be flexible:
笔记显示屏仅当“文件”控件被选中时才会显示图像文件。现在让我们深入了解这些视觉功能是如何集成到 GenAISys 界面中的。具体来说,我们将演示“文件”复选框被激活(选中)的情况,如图5.7所示:
Note that the display will only display image files if the Files widget is checked. Let’s now dive deeper into how these vision features are integrated within the GenAISys interface. Specifically, we’ll demonstrate the scenario where the Files checkbox is activated (checked), as depicted in Figure 5.7:
图 5.7:已选中“文件”复选框,因此将显示图像。
Figure 5.7: The files checkbox is checked so that the image will be displayed
选中“文件”复选框后,DALL-E 响应用户提示生成的图像将立即显示,如图5.8所示:
With the Files checkbox selected, the image generated by DALL-E in response to the user’s prompt will be immediately displayed, as shown in Figure 5.8:
图 5.8:输入提示并显示生成的图像
Figure 5.8: Entering a prompt and displaying the image generated
如果选择“文件”选项如果未选中,图像将被生成并保存,但不会保存。图像显示功能同样适用于分析从外部来源下载的图像。如果取消选中“文件”复选框,分析将在不显示图像的情况下运行。现在,我们可以开始研究图像生成功能的实现细节了。
If the Files option is not checked, the image will be generated and saved but not displayed. Similarly, image display functionality also applies to analyzing images downloaded from external sources. When the Files checkbox is unchecked, the analysis runs without visually displaying the image. We are now ready to examine the implementation details of the image generation function.
生成函数图像位于自定义生成式 AI 库的目录reason.py中commons。用户提示或 CoT 框架可以触发此函数。该函数名为 `<function_name>` generate_image,接受五个参数:
The function to generate an image is located in the custom generative AI library, reason.py, in the commons directory. A user prompt or a CoT framework can trigger this function. The name of the function is generate_image, and it takes five arguments:
def generate_image(
prompt, model="dall-e-3", size="1024x1024", quality="standard", n=1
):
以下是五个论点:
The five arguments are as follows:
prompt:与用户或系统提供的图像相关的查询。prompt: The query related to the image that is provided by the user or the system.model要使用的 OpenAI 模型。在这种情况下,默认值为gpt-4o。model: The OpenAI model to use. In this case, the default value is gpt-4o.size:图像的大小。图像的默认大小为1024x1024。size: The size of the image. The default size of the image is 1024x1024.quality:定义图像质量。默认值为standard,比更高质量的hd选项价格更低。quality: Defines the quality of the image. The default value is standard, which costs less than the higher-quality hd option.n:定义要生成的图像数量。默认值为1。n: Defines the number of images to generate. The default value is 1.该函数返回生成图像的 URL。代码首先初始化 OpenAI 客户端:
The function returns the URL of the generated image. The code first initializes the OpenAI client:
def generate_image(
prompt, model="dall-e-3", size="1024x1024", quality="standard", n=1
):
# Initialize the OpenAI client
client = OpenAI()
DALL-E 型号然后通过 OpenAI API 调用该函数,并传入指定的参数:
The DALL-E model is then called via the OpenAI API with the specified parameters:
# Generate the image using the OpenAI API
response = client.images.generate(
model=model,
prompt=prompt,
size=size,
quality=quality,
n=n,
)
参数的详细描述见第 1 章“环境设置”部分。
The parameters are described in detail in Chapter 1 in the Setting up the environment section.
一旦内容、消息和参数定义完毕后,调用 OpenAI API:
Once the content, messages, and parameters are defined, the OpenAI API is called:
# Make the API call
response = client.chat.completions.create(
model=model,
messages=messages,
**params # Unpack the parameters dictionary
)
response从以下位置提取并返回图像的 URL :
The URL of the image is extracted from response and returned:
# Extract and return the image URL from the response
return response. data[0].url
图像生成或检索完成后,我们可以根据需要选择显示或分析它。
Once an image has been generated or retrieved, we can choose to display or analyze it, depending on our needs.
该函数分析图像也位于此处在自定义生成式 AI 库中, reason.py位于commons目录中。这个名为 的函数image_analysis定义如下,并接受三个参数:
The function to analyze an image is also located in the custom generative AI library, reason.py, in the commons directory. This function, named image_analysis, is defined as follows, and takes three arguments:
def image_analysis(image_path_or_url, query_text, model="gpt-4o"):
这三个论点如下:
The three arguments are as follows:
image_path_or_url (str):访问本地图像文件的路径或图像的 URL。image_path_or_url (str): The path to access a local image file or the URL of the image.query_text (str)与用户或系统提供的图像相关的查询query_text (str): The query related to the image that is provided by the user or the systemmodel (str)要使用的 OpenAI 模型。在这种情况下,默认值为gpt-4o,该模型具备视觉功能(生成和分析)。model (str): The OpenAI model to use. In this case, the default value is gpt-4o, which possesses vision capabilities(generation and analysis).The function initializes the content structure for the API call with the provided query text:
# Initialize the content list with the query text
content = [{"type": "text", "text": query_text}]
The function then searches for the image in a URL or a local file:
if image_path_or_url.startswith(("http://", "https://")):
# It's a URL; add it to the content
content.append({"type": "image_url",
"image_url": {"url": image_path_or_url}})
else:
# It's a local file; read and encode the image data
with open(image_path_or_url, "rb") as image_file:
image_data = base64.b64encode(
image_file.read()).decode('utf-8')
如果图像位于 URL 中,则会将其附加到内容中。如果图像是本地文件,则会对其进行 Base64 编码并格式化为 UTF-8 字符串。这种格式允许将图像数据嵌入到基于文本的系统(例如 JSON 或 HTML)中。然后,系统会创建一个数据 URL 并将其附加到内容中:
If the image is in a URL, it is appended to the content. If the image is a local file, it is encoded in Base64 and formatted as a UTF-8 string. This format enables embedding the image data within text-based systems (such as JSON or HTML). A data URL is then created and appended to the content:
# Create a data URL for the image
data_url = f"data:image/png;base64,{image_data}"
content.append({"type": "image_url", "image_url": {"url": data_url}})
OpenAI消息的创建包含查询信息和图像的上下文:
The OpenAI message is created with the context that contains the query information and the image:
# Create the message object
messages = [{"role": "user", "content": content}]
API 调用包含一组标准参数,详见第 1 章(“设置环境”部分):
The API call includes a set of standard parameters, detailed in Chapter 1 (in the Setting up the environment section):
# Define the parameters
params = {
"max_tokens": 300,
"temperature": 0,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0,
一旦内容、消息和参数定义完毕后,调用 OpenAI API:
Once the content, messages, and parameters are defined, the OpenAI API is called:
# Make the API call
response = client.chat.completions.create(
model=model,
messages=messages,
**params # Unpack the parameters dictionary
)
更多信息集成,特别是与第 6 章中 Pinecone 使用的 RAG 集成,会将响应以文本形式保存到文件中。这使得后续的使用和检索成为可能:
For further integration, particularly with RAG using Pinecone in Chapter 6, the response is saved as text in a file. This enables subsequent use and retrieval:
# Save the result to a file
with open("image_text.txt", "w") as file:
file.write(response.choices[0].message.content)
return response.choices[0].message.content
image_analysis本章稍后构建的 CoT 推理过程也会调用此函数,其中query_text将动态创建并将参数传递给该函数:
This image_analysis function will also be called by the CoT reasoning process built later in this chapter, where query_text will be dynamically created and passed into the function:
response = image_analysis(image_url, query_text)
我们现在已将功能齐全的计算机视觉组件集成到 GenAISys 中。凭借这些功能,我们已准备好构建 CoT 推理流程。
We now have fully functional computer vision components integrated into our GenAISys. With these capabilities, we are ready to build the CoT reasoning process.
这全球市场的指数级加速发展,导致社交媒体、电商网站、生产线和SaaS平台等各种平台上每天产生数十亿个微任务。如果没有强大的自动化能力,就无法满足这些实时需求。速度和效率至关重要,要求任务必须实时或近实时执行。人工智能的最新进展极大地帮助我们适应了这些市场模式,在这种模式下,我们必须在越来越短的时间内处理越来越多的任务。然而,随着我们不断增加用于解决问题的人工智能功能的数量和范围,用户在使用辅助系统运行复杂场景时变得越来越困惑。对于开发团队来说,创建一个包含所需功能且操作步骤清晰直观的通用人工智能系统(GenAISys)也极具挑战性。
The exponential acceleration of global markets has led to billions of micro-tasks being generated daily across platforms such as social media, e-marketing sites, production lines, and SaaS platforms. Without robust automation, keeping pace with these real-time demands is impossible. Speed and efficiency have become paramount, requiring tasks to be executed in real time or near-real time. Recent advances in AI have significantly helped us adapt to these market paradigms, where we must handle an increasing volume of tasks in increasingly shorter timeframes. However, as we increase the number and scope of AI functions to solve problems, it is becoming confusing for users to run complex scenarios with copilots. It is also quite challenging for a team of developers to create a GenAISys that contains the functions they need and includes a clear and intuitive sequence of operations for problem-solving.
在本节中,我们通过实施CoT推理来应对这些挑战。CoT推理将复杂任务分解成更小、更易于管理的步骤,每个步骤的输出都作为下一个步骤的输入。这一过程模拟(而非取代)人类的推理方式,从而减轻用户的认知负荷,使他们能够专注于决策本身。此外,CoT推理还能使AI代理的内部思维过程透明化,从而实时解释每个推理步骤。
In this section, we address these challenges by implementing CoT reasoning. CoT reasoning breaks complex tasks into smaller, more manageable steps where the output of one step becomes the input for the next. This process mimics (without replacing) human-like reasoning. It reduces cognitive overload for users, allowing them to focus primarily on decision-making. Additionally, CoT reasoning makes the AI agent’s internal thought process transparent, providing real-time explainability of each reasoning step.
本节的目标是利用我们开发的灵活交互式 GenAISys 框架,使用 Python 构建 CoT 推理流程。具体来说,我们将应用 CoT 来模拟在线旅游平台的客户偏好分析,生成创意活动建议,使用 DALL-E 生成图像,并基于这些图像使用 GPT-4o 创建故事叙述。
The goal of this section is to build a CoT reasoning process using Python, leveraging the flexible and interactive GenAISys framework we’ve developed. Specifically, we will apply CoT to simulate customer-preference analysis for an online travel platform, generate creative suggestions for activities, produce images using DALL-E, and create storytelling narratives based on these images with GPT-4o.
乍一看,CoT认知代理似乎与传统软件开发中的函数序列类似。因此,在深入代码之前,我们先来明确它们之间的重要区别。
At first glance, a CoT cognitive agent might seem similar to traditional sequences of functions found in classical software development. Hence, let’s first clarify the important distinctions between them before we dive into the code.
经验丰富的软件开发人员习惯于实现复杂的函数序列。为了弥合传统软件序列与认知CoT推理(模拟而非取代人类认知)之间的概念鸿沟,我们首先需要明确区分它们的目的:
Seasoned software developers are used to implementing complex sequences of functions. To bridge the conceptual gap between traditional software sequences and cognitive CoT reasoning (which mimics rather than replaces human cognition), let’s first distinguish their purposes clearly:
CoT的另一个关键方面是其中间推理:
Another critical aspect of CoT is its intermediate reasoning:
CoT推理它提供的认知一致性更接近人类的思维模式。我们人类会将庞大的问题分解成更小的部分,处理每个部分,然后将中间结论整合起来,最终得出整体解决方案。我们在本章构建的类人框架——CoT流程,使GenAISys更加直观和富有创造性,它模仿(而非取代)人类的问题解决方法。在接下来的章节中,尤其是在第6章中,我们将进一步扩展和增强CoT的推理能力。这里要强调的是,CoT涉及一系列任务,但其方式比传统的非人工智能或人工智能序列更加灵活和富有创造性。接下来,让我们定义CoT推理的认知流程。
CoT reasoning offers cognitive alignment closer to human thinking patterns. We humans break monolithic problems into smaller parts, process each part, and then assemble the intermediate conclusions to reach a global solution. The human-like framework of the CoT process we are building in this chapter makes the GenAISys more intuitive and creative, mimicking (not replacing) human problem-solving methods. In the following chapters, notably in Chapter 6, we’ll further expand and enhance the CoT reasoning capabilities. The takeaway here is that CoT involves sequences of tasks, but in a more flexible and creative way than classical non-AI or AI sequences. Let’s move on and define the cognitive flow of CoT reasoning.
而不是为了避免使用传统的“流程图”术语,我们将使用“认知流程”来描述我们正在实施的CoT流程。该术语强调的是……我们的人工智能代理具备类人推理和动态问题解决能力,这与传统的软件流程图截然不同。传统的流程图以可视化的方式呈现一系列功能。而推理认知流程图(CoT)或认知工作流则描绘了人工智能代理思维过程的逻辑步骤。认知流程图展示了人工智能代理如何模仿人类的推理。
Instead of the traditional term flowchart, we’ll use the term cognitive flow to describe the CoT process we are implementing. This term emphasizes the human-like reasoning and dynamic problem-solving capabilities of our AI agent, differentiating clearly from classical software flowcharts. A classic flowchart provides a visual representation of a sequence of functions. A reasoning CoT cognitive flow or cognitive workflow maps the logical progression of the AI agent’s thought process from one step to another. The cognitive flow shows how the AI agent mimics human reasoning.
首先,让我们来看一下我们将用 Python 实现的认知流程,如图5.9所示。我们将使用的 Python 函数reason.py位于目录中,并在本章“设置环境”部分的OpenAIcommons小节中有详细描述。
Let’s first walk through the cognitive flow we will implement in Python, visualized in Figure 5.9. The Python functions we’ll use reside in reason.py, located in the commons directory, and are described in detail in the OpenAI subsection of this chapter’s Setting up the environment section.
图 5.9:CoT 过程的认知流程
Figure 5.9: Cognitive flow of the CoT process
认知流程我们的CoT推理过程由五个主要阶段组成,由chain_of_thought_reasoning()功能协调执行。该过程从“开始”阶段开始。
The cognitive flow for our CoT reasoning process consists of five main phases, orchestrated by the chain_of_thought_reasoning() function. This sequence begins with Start.
CoT推理过程始于AI代理接收到输入文本之时。AI代理分析用户输入,然后触发CoT函数,如前文“第二层:AI代理”部分所述。CoT函数启动时,会发生两个关键的初始化操作:初始化推理内存(steps = []),并在IPython交互式界面中激活推理显示控件。
The CoT reasoning process begins when it receives input text provided by the AI agent. The AI agent analyzes the user input and then triggers the CoT function, as described earlier in the Layer 2: AI agent section. At the start of the CoT function, two key initializations occur: the reasoning memory (steps = []) is initialized, and the reasoning display widget is activated within the IPython interactive interface:
steps = []
# Display the reasoning_output widget in the interface
display(reasoning_output)
display(reasoning_output)触发display小部件,从而实现实时更新交互式 IPython 界面,确保 CoT 过程保持透明,并易于用户理解。
display(reasoning_output) triggers the display widget, which enables real-time updates in the interactive IPython interface, ensuring the CoT process remains transparent and easily interpretable by users.
第一步,ML-baseline,激活机器学习端点(machine_learning.ml_agent())。它利用决策树分类器动态分析客户数据并预测感兴趣的活动。该函数接受一个位置(例如"Rome"),并将"ACTIVITY"目标列作为预测依据:
The first step, ML-baseline, activates the machine learning endpoint (machine_learning.ml_agent()). It utilizes a decision tree classifier to analyze customer data dynamically and predict activities of interest. The function takes a location (for example, "Rome") and "ACTIVITY" as the target column for the prediction:
# Step 1: Analysis of the customer database and prediction
steps.append("Process: Performing machine learning analysis of the customer database. \n")
with reasoning_output:
reasoning_output.clear_output(wait=True)
print(steps[-1]) # Print the current step
time.sleep(2) # Simulate processing time
result_ml = machine_learning.ml_agent("Rome", "ACTIVITY")
steps.append(f"Machine learning analysis result: {result_ml}")
这段代码在每个推理步骤中都会重复执行:
This block of code is repeated for each reasoning step:
# Step 1: Analysis of the customer database and prediction# Step 1: Analysis of the customer database and predictionsteps.append("Process: Performing machine learning analysis of the customer database. \n")将步骤描述添加到推理记忆步骤列表中steps.append("Process: Performing machine learning analysis of the customer database. \n") appends a description of the step to the reasoning memory step listwith reasoning_output为显示控件初始化一个代码块with reasoning_output initiates a code block for the display widgetreasoning_output.clear_output(wait=True)清除reasoning_output treasoning_output.clear_output(wait=True) clears reasoning_output tprint(steps[-1]) # Print the current step打印最近添加的步骤print(steps[-1]) # Print the current step prints the most recent step addedtime.sleep(2) # processing time引入两秒延迟time.sleep(2) # processing time introduces a two-second delayresult_ml =machine_learning.ml_agent("Rome", "ACTIVITY")呼叫ml_agentresult_ml =machine_learning.ml_agent("Rome", "ACTIVITY") calls ml_agentsteps.append(f"Machine learning analysis result: {result_ml}")将机器学习函数返回的结果添加到步骤列表中。steps.append(f"Machine learning analysis result: {result_ml}") appends the result returned by the machine learning function to the list of steps输出从machine_learning.ml_agent预测该地点最受顾客欢迎的活动"Rome",作为下一步的输入,提出创意活动建议。
The output from machine_learning.ml_agent, which predicts the top customer-preferred activity for the location "Rome", becomes the input for the subsequent step, suggesting creative activities.
在进行下一步之前,让我们简要地探索一下其内部的底层决策树分类器machine_learning.py。
Before moving on to the next step, let’s briefly explore the underlying decision tree classifier inside machine_learning.py.
决策树分类器非常适合我们的任务,因为它是一种机器学习模型,它通过基于特征值将数据分割成树状结构来进行预测。它的工作原理是递归地在每次分割时选择最优特征,直到达到预定义的停止条件,例如达到最大深度或每个叶节点的最小样本量。每一步,可能性都会缩小,直到最终确定一个预测结果。
A decision tree classifier is well suited for our task because it is a machine learning model that makes predictions by splitting data into a tree-like structure based on feature values. It works by recursively choosing the optimal feature at each split until it reaches a defined stopping condition, such as a maximum depth or a minimum sample size per leaf. At each step, the possibilities narrow down until a single prediction emerges.
要运行它,我们首先导入处理数据和构建决策树所需的库。我们还禁用了警告,以避免 IPython 输出过于冗杂:
To run it, we first import the required libraries for handling data and building the decision tree. We also disable warnings to avoid cluttering the IPython output:
import pandas as pd
import random
from sklearn.preprocessing import LabelEncoder # For encoding categorical variables
from sklearn.tree import DecisionTreeClassifier # For training the Decision Tree model
import warnings
warnings.simplefilter(action='ignore', category=UserWarning)
接下来,我们定义分类器函数,ml_agent()它有两个参数:
Next, we define our classifier function, ml_agent(), with two parameters:
def ml_agent(feature1_value, feature2_column):
这两个参数如下:
The two parameters are the following:
feature1_value:我们想要预测活动的位置的值。feature1_value: The value of the location we want to predict activities for.feature2_column"ACTIVITY":我们要预测的目标列( )。feature2_column: The target column ("ACTIVITY") we want to predict.该函数首先将客户活动数据集加载到 pandas DataFrame 中:
The function starts by loading the customer activities dataset into a pandas DataFrame:
# Load the dataset from a CSV file into a DataFrame
df = pd.read_csv("customer_activities.csv")
然后,我们对分类变量(LOCATION和ACTIVITY)进行编码以进行处理:
Then, we encode the categorical variables (LOCATION and ACTIVITY) for processing:
# Create LabelEncoder objects for encoding categorical variables
le_location = LabelEncoder()
le_activity = LabelEncoder()
# Encode categorical values
df["LOCATION_ENCODED"] = le_location.fit_transform(df["LOCATION"])
df["ACTIVITY_ENCODED"] = le_activity.fit_transform(df["ACTIVITY"])
如果没有提供具体位置(feature1_value),则该函数默认选择最常用的位置:
If no specific location (feature1_value) is provided, the function selects the most frequent location by default:
# Select default location if feature1_value is empty
if not feature1_value.strip(): # If empty string or only spaces
feature1_value = df["LOCATION"].mode()[0] # Most common location
We then prepare the features (X) and the target variable (y) from our encoded data:
# Select the encoded 'LOCATION' column as the feature (X)
X = df[["LOCATION_ENCODED"]]
# Select the encoded 'ACTIVITY' column as the target variable (y)
y = df["ACTIVITY_ENCODED"]
数据准备就绪后,我们训练决策树模型:
With our data prepared, we train the decision tree model:
# Train a Decision Tree Classifier on the dataset
model = DecisionTreeClassifier(random_state=42)
model.fit(X, y)
设置random_state=42确保每次运行代码都能得到一致的结果。现在,我们对提供的(或默认的)位置输入进行编码,以便进行预测:
Setting random_state=42 ensures consistent results each time we run the code. Now, we encode the provided (or default) location input to prepare it for prediction:
# Encode the input location using the same LabelEncoder
feature1_encoded = le_location.transform([feature1_value])[0]
Python.transform对象上的方法le_location将类别字符串转换为其唯一的整数代码。
The Python .transform method on the le_location object converts the categorical string into its unique integer code.
现在该函数已准备就绪,可以预测最可能的活动并将其转换回原始标签。我们将使用.predict训练好的模型的 Python 方法来查看它对这个新数据点的预测结果:
The function is now ready to predict the most probable activity and convert it back to its original label. We will use the Python .predict method of our trained model to see what it predicts for this new data point:
# Predict the encoded activity for the given location
predicted_activity_encoded = model.predict([[feature1_encoded]])[0]
# Convert the predicted numerical activity back to its original label
predicted_activity = le_activity.inverse_transform(
[predicted_activity_encoded]
)[0]
最后,该函数会根据预测的活动构建针对客户的描述性输出消息:
Finally, the function constructs a customer’s descriptive output message tailored to the predicted activity:
# Generate output text
text = (f"The customers liked the {predicted_activity} because it reminded them of how "
f"our democracies were born and how it works today. "
f"They would like more activities during their trips that provide insights into "
f"the past to understand our lives.")
This descriptive output is returned to the CoT function:
return text
要从 CoT 函数调用分类器,我们使用以下方法:
To invoke the classifier from the CoT function, we use the following:
result_ml = ml_agent("", "ACTIVITY")
print(result_ml)
我们让分类器来查找位置和活动。在这种情况下,预期输出如下:
We’re letting the classifier find the location and activity. The expected output, in this case, will be the following:
Machine learning analysis result: The customers liked the Forum of Rome because it reminded them of how our democracies were born and how it works today. They would like more activities during their trips that provide insights into the past to understand our lives.
现在让我们利用这一步骤的输出结果来提出活动建议。
Let’s now use the output of this step to suggest activities.
This step follows the same logic and structure as Step 1. The name of the process is as follows:
steps.append("Process: Searching for activities that fit the customer needs. \n")
步骤 1的输出(result_ml)成为发送给 GPT-4o 的指令的一部分,用于扩充输入上下文。umessage发送给 GPT-4o 的组合查询()如下所示:
The output from Step 1 (result_ml) becomes part of the instruction sent to GPT-4o to augment the input context. The combined query (umessage) for GPT-4o becomes as follows:
umessage = (
"What activities could you suggest to provide more activities and excitement in holiday trips."
+ result_ml
)
现阶段,这些指令是专门针对我们以旅行为中心的领域而定制的。在第 6 章中,我们将把这些指令演变为基于事件的动态变量。在这里,我们继续使用我们在前几章中构建的已建立的 GenAISys OpenAI API 调用:
At this stage, the instructions are tailored specifically for our travel-focused domain. In Chapter 6, we’ll evolve these instructions to become dynamic event-based variables. Here, we continue using the established GenAISys OpenAI API call we built in earlier chapters:
mrole = "system"
mcontent = (
"You are an assistant that explains your reasoning step by step before providing the answer. "
"Use structured steps to break down the query."
)
user_role = "user"
task_response = make_openai_api_call(umessage, mrole, mcontent, user_role)
从 GPT-4o 接收到的输出task_response将作为下一步(步骤 3)的输入。推理步骤的附加和显示方法与步骤 1保持一致。
The output received from GPT-4o (task_response) will serve as the input for the next step (Step 3). The method of appending and displaying the reasoning steps remains consistent with Step 1.
这一步骤首先,将上一步(task_response)中收到的详细建议直接作为提示传递给 DALL-E 的图像生成函数。此处的结构和逻辑与之前的步骤一致,现在侧重于生成图像:
This step begins by taking the detailed suggestion received from the previous step (task_response) and passing it directly as the prompt to DALL-E’s image generation function. The structure and logic here are consistent with the previous steps, now focused on generating images:
prompt = task_response
image_url = generate_image(prompt)
图像生成后,会被下载并保存到本地。如果选中“文件c_image.png”小部件,则该图像文件将通过 IPython 界面显示,如“第 1 层:IPython 界面”部分所述:
Once generated, the image is downloaded and saved locally as c_image.png. This image file will then be displayed through the IPython interface if the Files widget is checked, as explained in the Layer 1: IPython interface section:
…
save_path = "c_image.png"
image_data = requests.get(image_url).content
with open(save_path, "wb") as file:
file.write(image_data)
steps.append(f"Image saved as {save_path}")
…
图像生成并保存后,CoT 流程将进一步分析这个新创建的图像。
With the image now generated and saved, the CoT process advances to analyzing this newly created image.
输入此分析步骤使用的是步骤 3中生成的图像的 URL ,存储为image_url。如前所述,在本笔记本中,查询文本目前设置为对 GPT-4o 的通用但与旅行相关的请求。在后续章节中,此查询文本将变为事件驱动型且更加动态。
The input for this analysis step is the URL of the image generated in Step 3, stored as image_url. As mentioned earlier, in this notebook, the query text is currently set as a generic, yet travel-specific, request to GPT-4o. In subsequent chapters, this query text will become event-driven and more dynamic.
在图像分析中,我们指示生成式人工智能模型根据生成的图像创作一个引人入胜的故事:
For our image analysis, we instruct the generative AI model to craft an engaging story based on the generated image:
query_text = "Providing an engaging story based on the generated image"
封装这些指令的代码与之前的步骤相同。CoT 函数现在会激活image_analysis之前在图像生成和分析部分中描述的功能:
The code encapsulating the instructions is the same as in the previous steps. The CoT function now activates the image_analysis function as described previously in the Image generation and analysis section:
response = image_analysis(image_url, query_text)
输出结果返回到response变量中,并保存到image_text.txt文件中以供后续使用。至此,CoT推理步骤完成。
The output is returned to the response variable and saved in the image_text.txt file for further use. This marks the completion of the CoT reasoning steps.
所有推理任务完成后,CoT 函数会清除并更新 IPython 显示,以此发出过程结束的信号:
Upon completing all reasoning tasks, the CoT function signals the end of the process by clearing and updating the IPython display:
# Clear output and notify completion
with reasoning_output:
reasoning_output.clear_output(wait=True)
print("All steps completed!")
return steps
接下来由 IPython 接口接管。现在让我们从用户角度运行 CoT。
The IPython interface takes over from here. Let’s now run the CoT from a user perspective.
在此在本章节中,我们将无缝运行自本书开篇以来一直在构建的复杂 GenAISys 系统。只需一个提示,即可触发整个 CoT 流程。
In this section, we’ll seamlessly run the complex GenAISys we’ve been building since the beginning of the book. A single prompt will trigger the entire CoT process.
我们将模拟用户激活 GenAISys 的推理功能,以获取在线旅行社的全面创意。具体而言,我们的目标是预测客户偏好的活动,生成引人入胜的图像,并创建故事叙述,以唤起客户的情景记忆。这些情景记忆可能是现实世界的经历,也可能是梦想着去某个地方旅行并参与特定活动。
We’ll simulate a user activating the reasoning capabilities of the GenAISys to obtain comprehensive ideation for an online travel agency. Specifically, we aim to predict customer-preferred activities, generate engaging images, and create storytelling narratives to evoke customers’ episodic memories. These episodic memories might be real-world experiences or dreams of visiting a place and engaging in particular activities.
要运行此场景,请确保选中“AI 代理”和“文件”复选框,并仔细输入以下提示信息:
To run this scenario, make sure to check the AI Agent and Files checkboxes and enter the following prompt carefully:
Use reasoning to suggest customer activities.
AI代理将识别“ Use, reasoning”、“ customer,”和activities关键词,并触发我们在本章构建的CoT流程。或者,我们也可以实现一个下拉菜单,或在Pinecone索引中进行相似性搜索来检索特定的指令场景。STT输入也是可行的。然而,在本章中,我们将使用带有关键词的输入提示来清晰地说明CoT流程。
The Use, reasoning, customer, and activities keywords will be recognized by the AI agent and trigger the CoT process we built in this chapter. Alternatively, we could have implemented a drop-down menu or performed a similarity search in the Pinecone index to retrieve specific instruction scenarios. STT input is also possible. In this chapter, however, we’ll use typed prompts with keywords to clearly illustrate the CoT process.
在第 7 章中,我们将构建一个中央关键词注册表和一个协调器,以进一步优化 AI 代理的决策过程。
In Chapter 7, we’ll build a central keyword registry and an orchestrator to further optimize the AI agent’s decision-making process.
用户按下回车键后,我们只需坐下来观看,就像使用类似 ChatGPT 的在线助手一样。第一步是分析客户群,根据每日数据找出排名最高的活动,如下图所示。
Once the user presses Enter, all we have to do is sit back and watch just as we would with online ChatGPT-like copilots. The first process is to analyze the customer base to find the top-ranking activity based on daily data, as shown here.
图 5.10:搜索活动
Figure 5.10: Searching for activities
整个过程完成后,决策树分类器将返回结果:
Once the whole process is complete, the decision tree classifier returns the results:
..Machine learning analysis result: The customers liked the Forum of Rome because it reminded them of how…
下一阶段是寻找符合客户偏好的合适活动:
The next stage involves searching for suitable activities matching customer preferences:
图 5.11:搜索符合客户需求的活动
Figure 5.11: Searching for activities matching customer needs
GPT-4o 的创意输出提供了增强在线服务的结构化步骤:
The creative output from GPT-4o provides structured steps to enhance the online offerings:
Activity suggestions: To enhance holiday trips with more activities, especially focusing on cultural experiences, we can consider a variety of options. Here's a structured approach to brainstorming and suggesting activities:
…### Step 3: Suggest Activities
1. Historical Tours and Sites:
- Athens, Greece: Visit the Acropolis and the Agora, where democracy was born. Include guided tours that explain the significance of these sites.
- Philadelphia, USA: Explore Independence Hall and the Liberty Bell, focusing on the birth of modern democracy.
- Westminster, UK: Tour the Houses of Parliament and learn about the evolution of the British democratic system…
接下来,CoT 指示 DALL-E 根据这些建议的活动生成一幅引人入胜的图像:
Next, the CoT instructs DALL-E to generate an engaging image based on these suggested activities:
图 5.12:基于上一步输出的图像生成
Figure 5.12: Image generation based on the output of the previous step
因为“文件”复选框确认无误后,生成的图像将显示出来。这张图像颇具创意,每次运行都会有所不同:
Because the Files checkbox is checked, the generated image is displayed. This image is a rather creative one and will vary with each run:
图 5.13:文化和历史图像
Figure 5.13: A cultural and historical image
在这种情况下,图像包含诸如之类的文本…understanding of history and its impact on modern life.,这完全符合我们的要求。
In this case, the image contains text such as …understanding of history and its impact on modern life., which perfectly fits our request.
请注意,由于上下文变化以及 GPT-4o 等生成式 AI 模型的随机(概率)特性,每次运行可能会产生不同的输出。
Note that each run might produce a different output due to context variations and the stochastic (probabilistic) nature of generative AI models such as GPT-4o.
下一步是让 GPT-4o 为故事推广活动创作叙事,利用过去真实生活经历或想象旅行的情景记忆:
The next process involves asking GPT-4o to create a narrative for a storytelling promotion that leverages episodic memory of past real-life experiences or imagined trips:
图 5.14:根据生成的图像创建引人入胜的故事
Figure 5.14: Creating an engaging story based on the image generated
如前所述,GPT-4o 的叙述性输出结果仅供参考,实际输出结果会有所不同:
The narrative output from GPT-4o, shown, is illustrative and will vary, as noted earlier:
…Story response: In the bustling town of New Haven, a place where history and technology intertwined, a young historian named Clara discovered an ancient artifact that would change everything. The artifact, a mysterious tablet, was said to hold the secrets of the past, capable of bringing historical figures to life through augmented reality…
一旦CoT序列结束后,GenAISys保持其推理状态,等待新的独立提示或进一步的CoT运行:
Once the CoT sequence concludes, the GenAISys maintains its reasoning state, waiting for new standalone prompts or further CoT runs:
图 5.15:GenAISys 中推理功能持续激活
Figure 5.15: Reasoning is persistently activated in the GenAISys
笔记本中的“加载并显示对话历史记录”和“加载并汇总对话历史记录”部分使用了第 4 章中详细介绍的相同功能。
The Load and display the conversation history and Load and summarize the conversation history sections in the notebook utilize the same functions detailed in Chapter 4.
我们已成功构建了一个小型化的、类似 ChatGPT 的 GenAISys 系统,并配备了自定义功能,包括多用户支持、特定领域的 RAG(红黄绿)机制和定制化的 CoT(CoT)功能。在接下来的章节中,我们将把这个 GenAISys 框架应用于多个实际业务领域。
We’ve now successfully built a small-scale ChatGPT-like GenAISys equipped with custom features, including multi-user support, domain-specific RAG, and tailored CoT capabilities. In the upcoming chapters, we’ll apply this GenAISys framework across several practical business domains.
本章完成了 GenAISys 的基本框架,该框架由三层组成。第一层是一个 IPython 交互式界面,它充当协调器的角色。目前,它包含了语音功能、文件显示和 CoT 功能,以及用户输入、用户选择和 AI 代理组件。
In this chapter, we have completed the basic framework of the GenAISys, consisting of three layers. The first layer is an IPython interactive interface that acts as an orchestrator. It now includes voice capability, file display, and CoT features, alongside user inputs, user selections, and the AI agent widget.
第二层是AI代理编排器,由用户提示触发。这表明,在GenAISys系统中,由于这些组件的交互性,编排功能和控制功能之间的界限变得模糊。AI代理将任务分配给Pinecone索引(用于查询)和OpenAI API代理(用于生成任务,例如内容和图像生成)。AI代理还可以触发CoT流程,我们将在后续章节中进一步增强其功能。
The second layer is the AI agent orchestrator, triggered by user prompts. This demonstrates that within the GenAISys, the boundaries between orchestration and control functions are somewhat blurred due to the interactive nature of these components. The AI agent distributes tasks between the Pinecone index for querying and the OpenAI API agent for generative tasks, such as content and image generation. The AI agent can also trigger the CoT process, and we will further enhance its capabilities in the following chapters.
第三层也是最后一层包含了GenAISys的核心功能,其中包括由GPT-4o和DALL-E驱动的AI工作单元。在本章中,我们介绍了用于图像生成的DALL-E,并利用GPT-4o为这些图像提供富有洞察力的评论。此外,我们还实现了一个决策树分类器来预测客户行为,从而将机器学习功能集成到我们的GenAISys中。
The third and final layer contains the core functionality of the GenAISys, which involves AI workers powered by GPT-4o and DALL-E. In this chapter, we introduced DALL-E for image generation and utilized GPT-4o to provide insightful comments on these images. Additionally, we implemented a decision tree classifier to predict customer activities, incorporating machine learning capabilities into our GenAISys.
引入CoT功能标志着我们从终端用户的角度出发,迈出了构建无缝推理能力的第一步。复杂的任务需要能够模拟人类推理的先进人工智能系统。因此,在下一章中,我们将详细介绍GenAISys的推理能力以及其他功能。
Introducing the CoT feature marked our initial step toward creating seamless reasoning capabilities from an end user perspective. Complex tasks require sophisticated AI systems that can emulate human reasoning. Therefore, we will expand upon the reasoning abilities of the GenAISys, among other features, in the next chapter.
立即解锁本书的专属权益Unlock this book’s exclusive benefits now扫描此二维码或访问packtpub.com/unlock,然后按书名搜索本书。 Scan this QR code or go to packtpub.com/unlock, then search for this book by name. |
|
|
注意:开始之前请准备好您的采购发票。 Note: Keep your purchase invoice ready before you start. |
成功的广告活动的核心在于记忆。想想你昨天看到的广告。一年前甚至几年前的广告呢?你印象最深刻的广告往往对你最有效,但对其他人来说可能并非如此。任何广告公司面临的主要挑战都是设计能够激发不同人群积极反应的宣传内容。更重要的是,成功的营销活动力求让消费者记住品牌、产品和服务。
The foundational concept of a successful advertising campaign is memory. Think about the advertisements you saw yesterday. What about those from one year ago or even several years ago? The ads you remember most vividly are the ones most effective for you, but perhaps not for someone else. The primary challenge for any advertising agency is designing promotional content that triggers positive reactions in diverse individuals. More crucially, successful marketing campaigns strive to make consumers remember brands, products, and services.
尼尔森神经科学团队(Brandt & Nieuwenhuis,2017)解释了记忆在广告中为何如此重要。他们证明,记忆会在短短24小时后显著衰退,这使得广告难以产生持久效果。重复以及内容的情感或认知冲击等多种因素可以增强记忆保持。诸如GenAISys之类的智能系统的出现重塑了营销格局,因为这些系统能够模拟人类专家的营销推理能力。
The Nielsen Neuroscience team (Brandt & Nieuwenhuis, 2017) explains why memory is so important in advertising. They demonstrate that memory decays significantly after just 24 hours, making it difficult for advertisements to have lasting effects. Several factors, including repetition and the emotional or intellectual impact of the content, can enhance memory retention. The emergence of agentic systems such as the GenAISys has reshaped the marketing landscape because these systems can replicate human-like expert marketing reasoning.
在本章中,我们将增强之前章节中构建的 GenAISys 系统。首先,我们将设计一个针对特定细分市场的消费者记忆代理。该代理的目标是分析消费者如何编码促销信息。我们将首先探讨记忆的重要性及其结构,考察关键的记忆类别,例如短期记忆、长期记忆、显性记忆和隐性记忆,以及重要的维度,例如智力编码和情感编码。接下来,我们将通过将对消费者记忆的更深入理解整合到 GenAISys 的知识库中来扩展其架构。然后,我们将开发一个利用前几章中介绍的多模态能力的战略性消费者记忆代理。该代理将采用受神经科学启发的方法来构建定制化的营销信息。通过引入OpenAI 的高级o3 推理模型中的元认知功能,我们将使该代理能够在多步骤的 CoT 推理过程中进行复杂且接近人类水平的自我反思。
In this chapter, we will enhance the GenAISys we’ve been building throughout previous chapters. First, we’ll design a consumer memory agent tailored to a specific market segment. The goal of this agent is to analyze how consumers encode promotional messages. We’ll begin by exploring why memory matters and how it is structured, examining key memory categories such as short-term, long-term, explicit, and implicit memory, as well as important dimensions such as intellectual and emotional encoding. Next, we’ll expand the architecture of the GenAISys by integrating a deeper understanding of consumer memory into its knowledge base. We’ll then develop a strategic consumer memory agent leveraging the multimodal capabilities introduced in earlier chapters. This agent will employ a neuroscience-inspired approach to craft customized marketing messages. By introducing meta-cognition through OpenAI’s advanced o3 reasoning model, we will enable the agent to perform sophisticated, near-human self-reflection within its multistep CoT reasoning process.
此外,我们将把生成式人工智能模型转化为一个类似神经科学的智能体,使其能够进行分析推理,而不仅仅是生成内容。复杂的系统——例如人脑——远大于其各部分之和,机器智能亦是如此。使用 OpenAI 的 o3 推理模型的战略消费者记忆智能体将运用复杂的神经科学提示,分析酒店评论中消费者的记忆编码模式。由此产生的洞察将输入到一个多模态推理流程中,该流程基于第五章“利用思维链添加多模态、多功能推理”中介绍的 CoT 框架。最终,GenAISys 将利用这种详细的记忆分析,使用 GPT-4o 生成定制化的营销内容,并配以 DALL-E 生成的图像。
Further, we will transform our generative AI model into a neuroscientific-like agent capable of analytic reasoning rather than mere content generation. Complex systems—like the human brain—are more than the sum of their parts, and the same applies to machine intelligence. The strategic consumer memory agent using OpenAI’s o3 reasoning model will apply complex neuroscience-informed prompts to analyze consumer memory encoding patterns in hotel reviews. The resulting insights will feed into a multimodal thread-of-reasoning pipeline, building upon the CoT framework introduced in Chapter 5, Adding Multimodal, Multifunctional Reasoning with Chain of Thought. Ultimately, the GenAISys will leverage this detailed memory analysis to produce tailored marketing content using GPT-4o, accompanied by images generated by DALL-E.
最后,我们将通过添加新功能进一步增强 IPython 交互式界面,包括一个能够触发智能体元认知的小部件,用于记忆分析和客户服务任务。用户可以选择分析各种类型的内容以获取与记忆相关的见解,或发起面向客户服务的 CoT 交互。
Finally, we’ll further enhance the IPython interactive interface by adding new features, including a widget capable of triggering agentic meta-cognition for memory analysis and customer service tasks. Users will have the option to analyze various types of content for memory-related insights or initiate customer-service-oriented CoT interactions.
本章结束时,您将学会如何基于我们消费者内存代理的架构,构建一个适用于任何领域的、可定制的、推理驱动的 GenAISys。我们将逐步构建它。
By the end of this chapter, you will have learned how to build a customized, reasoning-driven GenAISys applicable to any domain based on the architecture of our consumer memory agent. We’ll construct it step by step.
本章涵盖以下主题:
This chapter covers the following topics:
让我们首先来设计增强型 GenAISys 界面及其人工智能驱动的功能。
Let’s begin by designing the enhanced GenAISys interface and its AI-driven functionalities.
消费者神经科学通过情感共鸣的个性化信息传递,可以显著提升品牌记忆度。本章首先分析消费者如何编码记忆。Nicks 和 Carriou (2016) 的研究表明,有效的消费者神经科学利用叙事沉浸式体验来讲述故事,使消费者产生情感共鸣,并对促销信息留下深刻印象。
Consumer neuroscience can significantly enhance brand memorability through emotionally resonant, personalized messaging. In this chapter, we begin by analyzing how consumers encode memories. Nicks and Carriou (2016) demonstrate that effective consumer neuroscience leverages storytelling through narrative transportation, where consumers become emotionally engaged and vividly remember promotional messages.
在我们的实施方案中,我们将深入分析消费者如何编码记忆,并秉持真实性原则。如果消费者对某项服务表示不满,我们的系统将进行相应调整。旨在突出产品和服务改进的信息。我们的目标是通过令人难忘、触动人心的信息,建立真诚的联系。
In our implementation, we’ll deeply analyze how consumers encode memories, maintaining an authentic approach. If a consumer expresses dissatisfaction with a service, our system will tailor messages to emphasize improved offerings. Our goal is to create genuine connections through memorable, emotionally resonant messages.
在本节中,我们将介绍如何增强我们在前几章中构建的 GenAISys:
In this section, we will describe how we enhance the GenAISys we have been building in the previous chapters:
我们先来探讨一下消费者内存代理的使用案例。
Let’s first explore the consumer memory agent use case.
这本章的用例展示了如何利用CoT消费者记忆代理增强GenAISys系统。我们的目标是通过对酒店评论的详细分析,了解特定消费者在选择酒店时的需求。我们开发的系统能够进行全面的内容分析,评估情感,并生成个性化的营销内容。CoT代理首先对酒店评论进行详细的记忆和情感分析,从而启动这一过程。它识别心理特征、情感标记和特定的记忆标签,并为每个分析片段分配一个情感分数。这些分数被组合起来,生成一个总体情感分数和一个等级评分(0-5)。通过这种方法,该代理能够有效地捕捉消费者如何感知、情感处理以及在记忆中保留特定的酒店体验。
The use case in this chapter demonstrates how the GenAISys can be enhanced with a CoT consumer memory agent. Our goal is to understand a specific consumer’s needs when selecting a hotel, using detailed analyses of hotel reviews. The system we develop performs comprehensive content analysis, evaluates sentiment, and generates personalized marketing content. The CoT agent initiates this process with a detailed memory and sentiment analysis of a hotel review. It identifies psychological features, emotional markers, and specific memory tags, assigning a sentiment score to each analyzed segment. These scores are combined to produce an overall sentiment score and a scaled rating (0–5). Through this approach, the agent effectively captures how particular hotel experiences are perceived, emotionally processed, and retained in a consumer’s memory.
基于这些提取的洞察和由此产生的认知画像,代理会利用评分来确定合适的营销策略。最终,消费者记忆代理会生成量身定制的推广内容,包括带有相关图片的个性化信息。例如,如果分析显示某位顾客通常不喜欢酒店,旅行社可以重点推荐该顾客之前积极提及的其他住宿选择或更优质的服务。在这种情况下,代理会生成一条带有吸引人图片的个性化信息,如图6.1所示。
Based on these extracted insights and the resulting cognitive profile, the agent uses the scaled rating to determine the appropriate marketing strategy. Ultimately, the consumer memory agent produces tailored promotional content, including a customized message accompanied by a relevant image. For instance, if the analysis reveals a customer who generally dislikes hotels, the travel agency can emphasize alternative accommodations or improved services that the customer has previously mentioned positively. In such cases, the agent generates a personalized message with an engaging image, as illustrated in Figure 6.1.
图 6.1:DALL-E 生成的基于自定义内容的图像
Figure 6.1: Custom content-based image generated by DALL-E
这客服人员通过分析从顾客酒店评论中识别出的认知特征,精心撰写热情、个性化的信息:
The agent crafts a warm, personalized message by analyzing the cognitive profile identified from customer hotel reviews:
Customer message: Dear Customer,
We invite you to consider a unique vacation experience that goes beyond the typical hotel stay. Imagine the warmth and comfort of staying with family or friends, where every moment is filled with genuine connections and cherished memories. From waking up to the aroma of freshly brewed coffee made just the way you like it to sharing home-cooked meals and laughter around a family table, this experience offers a heartfelt alternative to the impersonal nature of hotels. Embrace the spontaneity and freedom of living in the moment, surrounded by those who matter most. Whether it's a cozy evening of board games or a relaxed afternoon in the backyard, staying with loved ones provides a true home away from home. This vacation, choose the warmth of a family home and create memories that will last a lifetime. Book your stay with loved ones today and rediscover the true meaning of home.
Warm regards,
[Your Company Name]
生成式人工智能具有随机性,因此相同的输入不一定会产生相同的输出。因此,每次运行的结果可能都会有所不同。
Generative AI is stochastic, so the same input will not necessarily generate the same output. The response may thus change from one run to another.
现在让我们定义消费者内存代理将使用的内存结构。
Let’s now define the memory structure that the consumer memory agent will use.
现在在第一章介绍的记忆类别基础上,概述人类记忆结构中必不可少的部分。构建我们的消费者记忆代理。人类记忆是多方面的:短时记忆暂时捕捉完成即时任务或情绪处理所需的信息,如果没有强化,信息会迅速消退;长时记忆则存储信息。重大事件、知识和经验较长时间;语义记忆存储一般知识和事实,独立于个人经验;情景记忆捕捉亲身经历的事件,包括背景和细节;程序性记忆能够无意识地提取任务,例如行走或驾驶;情绪记忆根据情绪强度(积极或消极)对经历进行分类;而显性记忆则涉及有意识的回忆是直接的记忆,而内隐记忆则是无意识地运作的。
Let’s now expand upon the memory categories introduced in Chapter 1 by outlining the human memory structures essential for building our consumer memory agent. Human memory is multifaceted: STM temporarily captures the information necessary for immediate tasks or emotional processing, quickly fading without reinforcement; LTM stores significant events, knowledge, and experiences over extended periods; semantic memory stores general knowledge and facts, independent of personal experience; episodic memory captures personally experienced events with context and detail; procedural memory enables unconscious retrieval of tasks, such as walking or driving; emotional memory categorizes experiences based on emotional intensity—positive or negative; and explicit memory involves conscious recall, whereas implicit memory operates unconsciously.
如图 6.2所示,我们的消费者记忆代理将灵活组合这些记忆类别来分析消费者内容。这种分类方式为 o3 OpenAI 推理模型提供了足够的自由度,使其能够有效地解释消费者数据。
Our consumer memory agent will analyze consumer content using a flexible combination of these memory categories, as shown in Figure 6.2. The categorization provides the o3 OpenAI reasoning model sufficient freedom to interpret consumer data effectively.
图 6.2:记忆代理的记忆类别
Figure 6.2: The memory categories of the memory agent
上层的主要类别如下:
The main categories at the upper level are the following:
注意如何无记忆、短期记忆和长期记忆构成一个子集(浅绿色),而现实记忆、虚构记忆和时间记忆(浅橙色)则彼此关联。这些类别并非孤立存在;它们在现实生活中动态地相互联系。换句话说,我们的记忆并非以子集的形式运作,而是通过人工智能中所谓的“标签”来运作。一段记忆可以是多个标签的组合:
Notice how memoryless, short-term, and long-term memory form a subset (light green), and reality, fiction, and time memory (light orange) are all connected. These categories aren’t isolated; they interconnect dynamically in real life. Our memory, in other words, doesn’t function in subsets but with what we can call tags in AI. A memory can be a combination of multiple tags:
在考察这些记忆子类别时,我们很快就会意识到,它们与主要记忆类别(例如语义短时记忆或情景长时记忆)之间存在着大量的标签组合。此外,记忆可以无缝融合不同的子类别;例如,“我去年去过罗马”这句话就同时包含了情景记忆、语义记忆和时间记忆标签。而且,我们的记忆范围很广,从内隐记忆(例如无意识地整天眨眼)到外显记忆(例如因受到刺激而有意眨眼)。
When examining these memory subcategories, we quickly realize the vast number of possible tag combinations with the main memory categories—such as semantic STM or episodic LTM. Additionally, memories can seamlessly blend subcategories; for instance, the phrase “I visited Rome last year” combines episodic, semantic, and temporal memory tags simultaneously. Moreover, our memories range from implicit (subconsciously blinking our eyes all day) to explicit (intentionally blinking due to an irritation).
在我们的消费者记忆代理中,我们将请求对内容进行全面分析,并为每个文本片段分配相应的记忆标签。然而,即使是这种细致的标记本身也不够。为了有效地捕捉消费者体验,我们将为每个记忆标签添加三个分析维度:
In our consumer memory agent, we will request a thorough analysis of content, assigning appropriate memory tags to each text segment. However, even this detailed tagging is not sufficient by itself. To effectively capture consumer experiences, we will enrich each memory tag with three analytical dimensions:
这些考虑到这些改进,现在让我们来探讨如何将它们整合到我们正在不断发展的 GenAISys 架构中。
With these enhancements in mind, let’s now explore how we’ll integrate them into the architecture of our evolving GenAISys.
在此本章,我们将构建基于 GenAISys 现有的三层架构,如图5.3所示,并在此处重现:
In this chapter, we will build upon the existing three-layer architecture of the GenAISys, as illustrated previously in Figure 5.3 and reproduced here:
图 6.3:事件驱动型 GenAISys 的三层结构
Figure 6.3: The three layers of the event-driven GenAISys
我们的方法将采用自下而上的方式,从基础功能开始,向上依次经过人工智能代理,直至 GenAISys 接口:
Our approach will be bottom-up, starting from the foundational functions and proceeding upward through the AI agent to the GenAISys interface:
reason.py,这些功能专门针对消费者记忆代理和 CoT 推理而定制。我们还将开发一个独立的记忆分析函数,该函数提供类似神经科学的分析,适用于任何内容。reason.py), specifically tailored for the consumer memory agent and CoT reasoning. We will also develop a standalone memory analysis function that provides neuroscientific-like analyses applicable to any content.现在我们开始构建消费者内存代理。
Let’s now begin building the consumer memory agent.
在本节中,我们通过赋予 GenAISys 分析酒店评论的神经科学能力,将其提升到一个新的水平。消费者记忆代理将捕捉用户的认知、情感和生理状态,并通过六步 CoT 流程解码每个评论片段,如图6.4所示:
In this section, we take our GenAISys to the next level by equipping it with neuroscientific capabilities for analyzing hotel reviews. The consumer memory agent will capture a user’s cognitive, emotional, and physical mindset, decoding each review segment through a six-step CoT process, as illustrated in Figure 6.4:
图 6.4:记忆代理的思维链过程
Figure 6.4: Chain-of-thought process of the memory agent
消费者记忆代理的 CoT 将使用 OpenAI 的 o3、GPT-4o 和 DALL-E 来运行其六个步骤:
The consumer memory agent’s CoT will use OpenAI’s o3, GPT-4o, and DALL-E to run its six steps:
经过开发我们将逐一介绍这些步骤,并在下一节“GenAISys 界面:从复杂到简单”中将它们完全整合起来,旨在将 CoT 功能推广到酒店评论之外。
After developing these steps individually, we’ll integrate them fully in the upcoming section, GenAISys interface: From complexity to simplicity, aiming to generalize the CoT functionality beyond hotel reviews.
首先,打开1_Building_the_Consumer_Memory_Agent.ipynbGitHub 上 Chapter06 目录下的 notebook 文件(https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main),该 notebook 文件复用了之前构建的功能。我们将首先下载一个酒店评论数据集,作为 AI 代理的输入。
To begin our journey, open the 1_Building_the_Consumer_Memory_Agent.ipynb notebook, which reuses previously built functionality, within the Chapter06 directory on GitHub (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main). We will first download a dataset of hotel reviews to provide inputs to the AI agent.
我们将是我们将使用合成酒店评论来构建记忆代理。在本章中,我们将处理酒店评论,并将代理的记忆结构推广到我们希望分析的其他内容。出于版权原因,我们使用的数据集是人工和生成式人工智能辅助工具共同创建的合成评论数据集。
We will be using synthetic hotel reviews to build the memory agent. In this chapter, we will process hotel reviews but also generalize the memory structure of the agent to other content we wish to analyze. For copyright reasons, the dataset we are using is a synthetic dataset of reviews created manually and with a generative AI copilot.
如果您想探索更多数据集,可以使用 Kaggle 上提供的类似数据集,其中包含 TripAdvisor 酒店评论,该数据集可用于非商业私人实现,网址为https://www.kaggle.com/datasets/andrewmvd/trip-advisor-hotel-reviews。
If you wish to explore more datasets, you can use a similar dataset containing TripAdvisor hotel reviews available on Kaggle for non-commercial private implementations at https://www.kaggle.com/datasets/andrewmvd/trip-advisor-hotel-reviews.
在 GitHub 上运行“设置环境”部分(与第 5 章相同),并直接从 GitHub 存储库下载数据集:
Run the Setting up the Environment section on GitHub, identical to Chapter 5, and download the dataset directly from the GitHub repository:
download("Chapter06","hotel_reviews.csv")
小贴士:使用AI 代码解释器和快速复制功能,提升您的编码体验。在新一代 Packt 阅读器中打开本书。点击“复制”按钮。
Quick tip: Enhance your coding experience with the AI Code Explainer and Quick Copy features. Open this book in the next-gen Packt Reader. Click the Copy button
(1)快速将代码复制到您的编码环境中,或单击“解释”按钮
(1) to quickly copy code into your coding environment, or click the Explain button
(2)让 AI 助手向您解释一段代码。
(2) to get the AI assistant to explain a block of code to you.
购买本书即可免费获赠新一代 Packt Reader 阅读器。扫描二维码或访问packtpub.com/unlock,然后使用搜索栏按书名查找本书。请仔细核对显示的版本,确保您购买的是正确的版本。
The next-gen Packt Reader is included for free with the purchase of this book. Scan the QR code OR visit packtpub.com/unlock, then use the search bar to find this book by name. Double-check the edition shown to make sure you get the right one.
我们将使用 pandas DataFrame 处理数据集。程序现在加载 CSV 文件并显示数据:
We will process the dataset with a pandas DataFrame. The program now loads the CSV file and displays the data:
import pandas as pd
# Load the CSV file into a Pandas DataFrame
dfta = pd.read_csv('/content/hotel_reviews.csv',sep=',')
# display the DataFrame
dfta
该数据集包含两列主要内容:Review和Rating。例如,记录 0 的评分相对积极,为 3 分,而记录 1 的评分则明显积极,为 5 分:
This dataset contains two primary columns: Review and Rating. For instance, record 0 has a relatively constructive rating of 3, while record 1 shows a clearly positive rating of 5:
图 6.5:酒店评论数据集节选
Figure 6.5: Excerpt of the hotel review dataset
然而,仅凭评分不足以提供足够的深度——我们需要细致入微的情感分析。充分理解客户满意或不满意的原因。我们将选择一个具有挑战性的评论作为分析的起点:
Ratings alone, however, don’t provide sufficient depth—we require a nuanced sentiment analysis to fully grasp why a customer was satisfied or dissatisfied. We will choose a challenging review to begin our analysis:
index_number = 0 # Specify the index number
程序现在可以提取评论及其评分:
The program now extracts the review and its rating:
# Extract the desired fields
review = row['Review']
rating = row['Rating']
# Display the results
print(f"Review: {review}")
print(f"Rating: {rating}")
except IndexError:
print(f"Error: Index {index_number} is out of bounds for the DataFrame.")
except KeyError as e:
print(f"Error: Column '{e}' not found in the DataFrame.")
输出结果显示评论及其评分:
The output displays the review and its rating:
Review: We got a nice hotel. The parking was rather expensive. However, we got a good deal for my birthday. We arrived in during the night because of a late flight but the parking was open 24hours The check in was rapid. The room was a large size. Cool. But we didn't get the view expected. The bed was super comfortable and got a nice sleep for a few hours but then there was a raucous of a noisy crowd going to bed late. Anyway the price was acceptable and we made it to the shopping mall next to the hotel which reminded me of the one we had at home.
Rating: 3
我们选择了一篇比较难的评论,因为它既包含负面情绪也包含正面情绪。负面情绪部分评论的这一方面将促使智能体生成建设性的解决方案。在继续之前,请自行分析每个评论片段的记忆标签、情感评分和维度。此练习有助于明确记忆类别的使用情况,并为将您的见解与智能体的分析进行比较提供基准。将提取的评论设置为初始输入:
We have chosen a difficult review because it contains both negative and positive sentiment. The negative aspect of the review will challenge the agent to generate constructive solutions. Before continuing, analyze the memory tags, sentiment scores, and dimensions of each review segment yourself. This exercise clarifies memory category usage and provides a benchmark for comparing your insights to the agent’s analysis. Set the extracted review as the initial input:
input1=review
现在我们将为步骤 1input1设计一个复杂的系统消息。
We will now design a complex system message for Step 1 for input1.
这一步骤通过 OpenAI o3 推理模型,将元认知和元推理融入到我们的消费者记忆代理中,从而引入了高级推理能力。换句话说,该代理不再只是简单地处理文本,而是会主动反思其内部推理过程,逐段分析文本,对记忆类型进行分类并赋予情感评分。
This step introduces advanced reasoning to our consumer memory agent by incorporating meta-cognition and meta-reasoning through the OpenAI o3 reasoning model. In other words, the agent won’t simply process text—it will actively reflect on its internal reasoning, performing a segment-by-segment analysis to categorize memory types and assign sentiment scores.
具体来说,o3模型将在我们精心设计的系统信息框架内运行,该框架将由我们详细设计。这条系统信息能够清晰地指导模型,促使其进行深度推理,并确保其基于类似人类的认知过程准确地分配记忆标签。我们无疑已经进入了能够进行推理和自我反思的人工智能时代!
Specifically, the o3 model will operate within our carefully structured system message, which we will design in detail. This system message guides the model clearly, prompting deep reasoning and ensuring it assigns memory tags accurately based on human-like cognitive processes. We are definitely in the era of reasoning and self-reflecting AI!
在本节中,我们将执行以下操作:
In this section, we will do the following:
system_message_s1为了模块化,这条消息(命名为 `<message_name>`)将单独存储在一个 Python 文件中。system_message_s1, will be stored separately in a Python file for modularity.请注意,我们使用 o1 作为统称,向 LLM 表明其作为推理模型的角色。此外,LLM 在响应中可能会直接引用 o1,尽管我们在 API 中尽可能使用 o3。
Note that we use o1 as an umbrella term to signal to the LLM its role as a reasoning model. Additionally, the LLM may refer to o1 itself in responses though we call o3 as much as possible in the API.
现在让我们一步一步地构建这个详细的系统消息。
Let’s now construct this detailed system message step by step.
我们必须设计一个足够全面的系统消息,以便模型能够深入理解并执行受神经科学启发的记忆分析。为了实现这一点,我们将消息精心组织成清晰标记的各个部分,每个部分引导智能体完成分析的不同方面。
We must design a system message comprehensive enough for the model to deeply understand and execute a neuroscience-inspired memory analysis. To achieve this, we carefully structure the message into clearly labeled sections, each guiding the agent through different aspects of the analysis.
第一行经纪人的语气分为两个层次。第一层次为智能体提供理解此任务所需的高级记忆分析概念。第二层次详细描述智能体的角色:
The first line sets the tone for the agent at two levels. The first level provides the agent with the necessary concepts to understand advanced memory analysis for this task. The second level describes the role of the agent in detail:
You are a generative AI model, an advanced memory-analysis model. Your role is to examine **each segment** of an incoming text and generate a set of "memory encoding tags," similar to how the human brain encodes memories in neuroscience. For every segment in the input, you will identify which categories apply, discuss the rationale, and assign additional metadata (dimension, sentiment, etc.).
现在,让我们一起来看看这些指令,以便了解代理通过消息的第一部分学到了什么:
Now, let’s go through the instructions to grasp what the agent is learning through this first part of the message:
generative AI model, an advanced memory-analysis model我们正在以一种特殊的方式设定系统的角色。我们要求模型进行思考,而不仅仅是生成文本。对于这项任务,我们不希望模型被创建出来,而是希望它进行分析和推理。generative AI model, an advanced memory-analysis model: We are setting the role of the system in a special way. We are asking the model to think, not just to generate text. For this task, we don’t want the model to be created but to analyze and reason.examine **each segment**我们正在训练模型模拟神经科学方法。我们的大脑将信息编码成离散的单元。在这种情况下,我们要求模型模仿人类的记忆过程。文本的每个片段可以是句子、句子的一部分或段落。这样,模型就能以类似于人脑将信息编码成独立单元的方式来分析文本。examine **each segment**: We are teaching the model to replicate a neuroscience approach. Our brain encodes information in discrete packages. In this case, we are asking the model to mimic human memory processes. Each segment of text can be a sentence, a sentence piece, or a paragraph. This way, the model will analyze the text in a manner similar to how a human brain encodes information in independent packages.generate a set of "memory encoding tags," similar to how the human brain encodes memories人类大脑会用标签来编码记忆,我们可以用这个术语来概括,而无需深入探讨其生物学过程。我们的大脑会给编码的每一条信息都贴上标签,以便区分过去的事件和未来的事件,例如,区分语义数据或个人情感体验。这些标签代表了我们在人类生成的文本中寻找的记忆类别。generate a set of "memory encoding tags," similar to how the human brain encodes memories: Human brains encode memories with tags, a term we can use at a high level without going into the biological process. Our brains apply tags to every bit of information that they encode to differentiate a past event from a future event, for example, from semantic data or personal emotional experiences. These tags represent the categories of memory we are looking for in human-generated text.discuss the rationale, and assign additional metadata模型必须解释其所标记的记忆类别背后的原理。每个类别,例如短时记忆(STM)或长时记忆(LTM),都必须加以解释。我们需要知道为什么某个记忆片段会被赋予特定的记忆标签。模型需要为其描述添加维度,包括:出于理智和情感方面的原因。discuss the rationale, and assign additional metadata: The model must explain the rationale behind the category of memory it tags. Each category, such as STM or LTM, must be explained. We need to know why a memory tag was attributed to the segment. The model is asked to add dimensions to its description, including intellectual and emotional reasons.你可能会注意到---代码中有一个 Markdown 分隔符()。它告诉模型我们现在要切换到另一个主题。这看起来似乎无关紧要,但我们需要像给人类下达指令一样强调主题的切换。现在,我们将赋予模型一个目标。
You might notice a Markdown divider (---) in the code. It shows the model that we are now moving to another topic. This may seem unimportant, but we need to emphasize topic changes as we do when giving instructions to humans. Now, we will give the model a purpose.
第 3 行是标题行。这表明我们即将进入信息的第一个重要部分:
Line 3 is a header that shows the model that we are entering the first significant section of the message:
### 1. Purpose
第 4 行定义了 OpenAI 推理模型 o3 的目标:
Line 4 defines the goal of o3, OpenAI’s reasoning model:
The goal is for you, O1, to perform an **in-depth memory analysis** of each segment of the text. In other words, you will classify and label each segment you find using specific memory categories (also called "memory encoding tags"). This process provides insight into how different parts of the text might be encoded in human memory.
请注意,消息中包含“o1”,这里将其用作 OpenAI 推理模型的统称。其主要目的是让 API 理解我们期望进行推理。无论您选择哪个推理模型,此指令都会激活推理功能。本“目的”部分的关键内容强调了我们的期望:
Note that the message contains “o1,” which is used as an umbrella term for OpenAI’s reasoning models here. The main idea is for the API to understand that we expect reasoning. This instruction will activate reasoning no matter which reasoning model you select. The key parts of this Purpose section insist on what we expect:
in-depth memory analysis我们不需要传统的分析方法,而是需要深入探讨每个部分的细节。in-depth memory analysis: We do not want a classical analysis but a reflection that goes into the details of each segment.Classify and label each segment you find using specific memory categories这有力地表明了模型需要标记的记忆类别。我们再次提醒智能体,我们并非要生成文本,而是要对文本片段进行分类和标记。Classify and label each segment you find using specific memory categories: This is a strong indicator of the memory categories the model is expected to tag. Once again, we remind the agent that we don’t want to generate text but to classify and label segments.provides insight into how different parts of the text might be encoded in human memory这明确表明我们期望它具备类似人类的思维能力,并且复制了大脑编码记忆的方式。provides insight into how different parts of the text might be encoded in human memory: This is an explicit indication that we expect human-like thinking and replicates the way a brain encodes memories.现在我们需要给代理提供它需要的标题,以便它学习这些类别。开头几行提供了清晰的说明。现在,我们已经进入消息的第二部分。
We now need to give the agent the heading it needs to learn the categories. The first lines provide clear instructions. Now, we have reached section 2 of the message.
我们现在教授该代理如何识别不同类别的人类编码标签。我们正在深入探讨人类记忆编码的核心。这些记忆类别在本章“定义记忆结构”部分中讨论过:
We now teach the agent how to recognize different categories of human encoding tags. We are getting to the core of human memory encoding. The memory categories are those discussed in the Defining memory structures section of this chapter:
### 2. Memory Encoding Tags (Categories)
这个标题至关重要,因为智能体会通过这个标题提供的线索来学习我们期望的标签。现在,智能体已经吸收了标题信息。接下来,我们需要向模型清晰地解释我们期望它执行哪些操作:
This heading is vital as the agent will learn the tags we expect by taking a hint from this heading. Now, the agent has absorbed the heading. We then give the model a clear explanation of what actions we expect:
Below is a list of memory categories you must use. Consider them your "tagging schema." A single segment may exhibit one or more categories. If no category seems relevant, you may provide the special tag "memoryless" to indicate no significant memory encoding.
让我们重点关注这条信息的关键部分:
Let’s focus on the key parts of this message:
tagging schema:使模型与人脑编码不同类别记忆的方式保持一致,使用标签区分过去和现在。tagging schema: Aligns the model with the way the human brain encodes different categories of memory, distinguishing the past from the present using tagsA single segment may exhibit one or more categories向模型解释,记忆可以像人脑一样,被编码在多个类别中。A single segment may exhibit one or more categories: Explains to the model that a memory can be encoded in more than one category, just like in a human brainIf no category seems relevant … memoryless:告诉模型,如果无法确定记忆类别,则应分配一个无记忆标签。If no category seems relevant … memoryless: Tells the model that it should assign a memoryless tag if it cannot determine a category of a memory然后,我们明确定义了各种类别(例如,短时记忆、长时记忆、情景记忆、语义记忆、时间记忆、现实记忆、虚构记忆、无记忆),如前所述:
We then clearly define the categories (e.g., STM, LTM, episodic memory, semantic memory, time memory, reality memory, fiction memory, memoryless), as previously discussed:
1. **Short Term Memory (STM)**
- Used for information that seems fleeting, recently introduced, or relevant only in the immediate context.
…
8. **Memoryless**
- If a segment does not appear to connect to any memory encoding or is purely functional text (e.g., disclaimers, random filler), label it "memoryless."
虽然已经描述了记忆标签,但这些标签不足以捕捉人类记忆,人类记忆依赖于其他维度来编码事件。
The memory tags have been described but are insufficient to capture human memory, which relies on other dimensions to encode events.
这“维度”部分为特工的调查增添了智力、情感和生理特征。以下信息中对这些维度的描述已在之前的“定义记忆结构”部分中阐述过:
The dimensions section adds intellectual, emotional, and physical features to the agent’s investigation. The descriptions of these dimensions in the following message were described in the Defining memory structures section earlier:
Dimension Descriptions
1. Intellectual
2. Logical, analytical, or factual thought processes.
- Explanation: If the text focuses on reasoned arguments, data,
or factual details, it should be labeled "Intellectual."
3. Emotional
4. Feelings, mood, or affective elements.
- Explanation: If the text displays happiness, sadness, or other
strong emotional content, "Emotional" is assigned.
5. Physical (with Sensations
至此,我们已经定义了记忆的类别和其他维度。然而,我们还需要对情绪进行更精细的分析。
With that, we have defined the memory categories and additional dimensions. However, we also need a more refined analysis of emotions.
作为情感得分的定义见“定义记忆结构”部分,它衡量片段的情感价值。它提供一个介于 0(负面)和 1(正面)之间的数值,如果无法检测到任何情感,则为 0.5(中性):
As defined in the Defining memory structures section, the sentiment score measures the emotional value of a segment. It provides a numerical score between 0 (negative) and 1 (positive), or 0.5 (neutral) if no sentiment can be detected:
### 4. Sentiment Score
Assign each segment a **sentiment score** between **0.0** and **1.0**, where:
- **0.0** = very negative
- **0.5** = neutral
- **1.0** = very positive
If a segment is purely factual with no emotional valence, use 0.5 (neutral).
请注意,邮件中的每个部分都以清晰的 Markdown 指示符开始和结束,以表明主题的变化。
Note that each section in the message begins and ends with clear Markdown indicators that show a change in topic.
接下来,我们将要求采用特定的回复格式。
Next, we are going to ask for a specific response format.
我们需要回复应清晰地展示原文的每个部分,为每个部分提供记忆标签,确定每个部分的维度(智力、情感或生理),提供情感评分,并提供简要解释以证明分析的合理性:
We need the response to clearly display each segment of the original text, provide memory tags for each segment, determine the dimension (intellectual, emotional, or physical) of each segment, provide a sentiment score, and provide a brief explanation to justify the analysis:
### 5. Format of the Response
For **each segment** in the incoming text:
1. Show the segment excerpt or a short summary.
2. **Memory Tags**: list any relevant categories.
3. **Dimension**: choose intellectual, emotional, or physical.
4. **Sentiment Score**: 0.0 → 1.0.
5. **Brief Explanation**: why these tags/dimensions.
为了确保模型理解我们的要求,我们提供了一个示例格式:
To make sure the model understands what we are asking for, we provide an example format:
Example format:
Segment 1: "Excerpt..."
- Memory Tags: [Time Memory Past, Reality Memory]
- Dimension: Emotional
- Sentiment Score: 0.7
- Explanation: The speaker refers to a past real event with positive affect.
如果我们编写的是传统的生成式人工智能模型消息,到此即可。然而,这是一条复杂的消息,因此我们需要添加指令来明确我们的预期。
If we were writing a traditional generative AI model message, we could stop here. However, this is a complex message, so we need to add instructions to insist on what we expect.
我们避免了避免信息过载。如果我们试图塞入过多指令,模型可能会出错。我们需要提醒系统,我们始终希望进行分段分析。我们坚持,如果模型找不到类别,我们希望它使用“无记忆”标签,而不是凭空捏造。此外,我们只希望得到简短明了的解释:
We avoided overloading the previous sections of the message. If we try to squeeze too many instructions in, the model might get confused. Let’s remind the system that we always want a segment-by-segment analysis. We insist that if the model doesn’t find a category, we want a “memoryless” tag and not a hallucination. Additionally, we only want short and clear explanations:
### 6. Additional Instructions
- Always analyze segment-by-segment.
- If no memory category applies, use "memoryless."
- Use a short but clear explanation.
现在到了棘手的部分。我们告诉模型,如果它完全找不到类别,就使用“无记忆”标签。但是,如果模型有想法但不是百分之百确定,那么它可以选择最有可能的有记忆标签,并附上一个强制性的情感评分:
Now comes the tricky part. We told the model that if it didn’t find a category at all, to use a “memoryless” tag. However, if the model has an idea but is not 100% sure, then it is allowed to pick the most probable memory tag along with a mandatory sentiment score:
- If uncertain about the correct memory category, pick the most likely.
- Always include a sentiment score.
至此,我们已经向模型提供了大量指令。让我们确保它记住其主要任务。
At this point, we have provided the model with numerous instructions. Let’s make sure it remembers its primary task.
毕竟我们已向模型发出指令,在此提醒模型,其主要任务是对文本片段进行记忆标签分析。我们也期望输出格式符合以下定义:
After all the instructions we have given the model, we will remind the model that its primary task is a memory tag analysis of text segments. We also expect the format of the output to be structured as defined:
### 7. Primary Task
When I provide multisegment text, you must do a thorough memory-tag analysis for each segment. Return the results in the structured format above.
[End of System Prompt]
请注意,我们添加了相关代码[End of System Prompt],以确保模型能够理解全局提示中的消息部分现在已经完全定义。我们使用这个术语prompt是为了确保它将其理解为一组指令,而不仅仅是一条通用消息。
Note that we added [End of System Prompt] to make sure that the model understands that the message part of the global prompt is now completely defined. We use the term prompt to make sure that it understands it as a set of instructions, not just a general message.
现在我们准备进行内存分析。
We are now ready to run the memory analysis.
该综合体我们设计的系统消息存储在 GitHub 代码库目录中一个名为 ` system_message_s1 in`的变量中。这样做的目的是将此消息以及其他步骤的消息与函数调用分开,以便 GenAISys 的 AI 代理能够将此步骤或其他步骤中的函数重新用于不同的任务。cot_message_c6.pycommons
The complex system message we designed is stored in a variable named system_message_s1 in cot_message_c6.py in the commons directory of the GitHub repository. The goal is to keep this message and those for other steps separate from the function calls so that the AI agent of the GenAISys can repurpose the function in this step or other steps for different tasks.
我们首先下载包含消息的文件:
We first download the file that contains the messages:
download("commons","cot_messages_c6.py")
然后我们导入system_message_s1消息以及步骤 4 中需要的消息,我们稍后会讨论步骤 4:
Then we import the system_message_s1 message and the messages we will need for Step 4, which we will discuss later:
from cot_messages_c6 import (
system_message_s1, generation,imcontent4,imcontent4b)
print(system_message_s1) # Print to verify
该print函数已取消注释,将显示我们刚刚创建的消息。您可以随时将其注释掉并用于验证消息是否已正确导入。现在我们准备 o3 的消息:
The print function is uncommented and will display the message we just created. It can be commented and used at any time to verify whether the message is correctly imported. We now prepare the messages for o3:
# Step 1 : Memory and sentiment analysis
mrole= system_message_s1
user_text=review
mrole是system_message_s1我们设计的系统消息mrole is system_message_s1, the system message we designeduser_text 这是review从酒店评论数据集中选择的评论。user_text is review, the review selected from the hotel reviews dataset现在我们调用 o3 并将结果存储在一个变量中:
We now call o3 and store the result in a variable:
retres=reason.make_openai_reasoning_call(user_text, mrole)
make_openai_reasoning_call位于reasonGenAISys 的 AI 库中。它需要两个我们刚才定义的参数会创建一个 OpenAI 客户端,发出请求,并返回响应:
make_openai_reasoning_call is located in reason, the AI library of the GenAISys. It takes the two arguments we just defined, creates an OpenAI client, makes the request, and returns the response:
# Implemented in Chapter06
def make_openai_reasoning_call(user_text, mrole):
system_prompt=mrole
client = OpenAI()
rmodel = "o3-mini" # o1 or other models. model defined in this file in /commons to make a global change to all the notebooks in the repo when there is an OpenAI update
response = client.chat.completions.create(
model=rmodel,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_text}
],
)
return response.choices[0].message.content
本次调用,我们选择了o3-mini该推理模型系列的版本o3。您也可以选择其他版本和推理模型。程序会显示接收到的输出retres:
For this call, we chose the o3-mini version of the o3 reasoning model series. Other versions and reasoning models can be chosen. The program displays the output received in retres:
# Print the generated output (memory analysis)
print(retres)
输出结果展示了系统消息的深度以及 o3 推理模型。如第一段所示,人工智能模型已将内容分解成多个片段,并解码了人工审阅者潜意识中使用的记忆标签。
The output shows the depth of the system message and the o3 reasoning model. The AI model has broken the content down into segments and decoded the memory tags subconsciously used by the human reviewer, as shown in the first segment.
该模型首先提供片段编号和该片段的内容。我们重点关注第 7 片段:
The model first provides the segment number and the content of that segment. Let’s focus on segment 7, which requires our attention:
Segment 7: "But we didn't get the view expected."
它还提供了对该片段进行编码的内存标签:
It also provides the memory tags that encoded this segment:
- Memory Tags: [Episodic Memory, Reality Memory]
接下来,它给出了尺寸,如下所示:
It continues by providing the dimension, which is as follows:
• Dimension: Emotional
然后它会给出一个情感评分,如下所示:
It then gives a sentiment score, which is as follows:
• Sentiment Score: 0.4
Finally, it produces an explanation that sums up its analysis:
- Explanation: The disappointment regarding the view introduces a negative emotional element to this real-life account, impacting the overall perception of the stay.
模型随后继续分析评论的所有部分。 至此,我们已完成复杂的记忆分析,为后续步骤奠定了基础。接下来,我们将提取情感得分。
The model then continues its analysis for all the segments of the review. We have now performed a complex memory analysis that sets the stage for the subsequent steps. Let’s proceed to extract the sentiment scores.
由此从这一点开始,存储的原始输入review不会再次使用。CoT 流程依赖于上一步的输出,而该输出会根据上下文不断变化。下一步是提取第一步(记忆和情感分析)中生成的所有片段的情感得分。我们需要这些信息来为第四步(内容创建)做出决策。
From this point on, the original input stored in review is not used again. The CoT process relies on the output of the previous step, which will continually vary depending on the context. The next step involves extracting the sentiment scores for all segments produced in Step 1: Memory and sentiment analysis. We will need this information to make decisions for Step 4: Content creation.
为了提取分数,我们首先创建一个extraction函数并提供详细说明:
To extract the scores, we first create an extraction function and provide detailed instructions:
def extract(tasks_response):
umessage = """
1) Read the following text analysis that returns detailed memory tags for each part of the text
2) Then return the list of memory tags with absolutely no other text
3) Use no formatting, no hashtags, no markdown. Just answer in plain text
4) Also provide the sentiment analysis score for each tag in this format(no brackets) : memory tag sentiment Score
"""
我们已明确指示 GenAISys 仅以干净格式提供情感得分。现在,我们将调用 GPT-4o reason.make_openai_api_call,并添加reason.py之前章节中开始构建的 AI 库。API 调用的输入是上一步的输出,retres并附加到指令消息中umessage:
We have clearly instructed our GenAISys to provide the sentiment scores in a clean format only. We will now call GPT-4o with reason.make_openai_api_call, defined previously, and add reason.py, the AI library we began building in the previous chapters. The input to the API call is the output of the last step, retres, appended to the instruction message, umessage:
umessage+=retres
该system角色提醒代理人其心理营销功能:
The system role reminds the agent of its psychological marketing function:
mrole = "system"
mcontent = "You are a marketing expert specialized in the psychological analysis of content"
该user角色输入用户消息,umessage并进行 API 调用:
The user role introduces the user message, umessage, and the API call is made:
user_role = "user"
task_response = reason.make_openai_api_call(
umessage,mrole,mcontent,user_role
)
return task_response
代理返回结果task_response,我们将从中提取记忆情感得分,进行处理和验证:
The agent returns task_response, from which we will extract the memory sentiment scores, process, and verify:
# Step 2: Extract scores
task_response=extract(retres)
print(task_response)
输出结果为每个记忆标签对应的每个片段的预期得分列表:
The output is the list of scores per segment we expected for each memory tag:
Reality Memory sentiment 0.8
Episodic Memory sentiment 0.8
Reality Memory sentiment 0.4
Episodic Memory sentiment 0.4
Episodic Memory sentiment 0.8
Reality Memory sentiment 0.8
Time Memory Past sentiment 0.8
Episodic Memory sentiment 0.5…
We now need to consolidate these scores to use them for decision-making.
我们将本函数使用一个简单的非人工智能正则表达式(re)模块进行模式匹配和提取。这表明 GenAISys CoT 可以包含非人工智能函数,从而将其应用范围扩展到生成式人工智能模型之外。
We will use a simple non-AI regular expressions (re) module for this function for pattern matching and extraction. This shows that a GenAISys CoT can contain non-AI functions that expand its scope beyond generative AI models.
待分析的文本是上一步的输出结果:
The text to analyze is the output of the previous step:
# Input text
text=task_response
我们正在寻找小数:
We are looking for decimals:
# Regular expression to extract sentiment scores
pattern = r"(\d+\.\d+)"
scores = [float(match) for match in re.findall(pattern, text)]
然后我们显示分数:
We then display the scores:
# Output the extracted scores
print("Extracted sentiment scores:", scores)
输出结果包含分数:
The output contains the scores:
Extracted sentiment scores: [0.8, 0.8, 0.4, 0.4, 0.8, 0.8, 0.8, 0.5, 0.5, 0.5, 0.7,…
We first calculate an overall score if the function returned scores:
# Optional: calculate the overall score and scaled rating
if scores:
overall_score = sum(scores) / len(scores)
然后我们将分数从 1 到 5 进行分级:
Then we scale the score from 1 to 5:
scaled_rating = overall_score * 5
最后,我们展示总分和换算后的分数:
Finally, we display the overall score and the scaled score:
print("Overall score (0–1):", round(overall_score, 2))
print("Scaled rating (0–5):", round(scaled_rating, 2))
输出结果符合预期:
The output is what we expected:
Overall score (0–1): 0.63
Scaled rating (0–5): 3.14
输出结果需要人工分析:
The output requires some human analysis:
不过就目前而言,我们已经掌握了确定要创建的内容所需的信息。
For now, however, we have the information we need to determine the content to create.
在决定之前关于要创建的内容,代理会读取信息消息。第一条消息是umessage4:
Before deciding on the content to create, the agent reads the information messages. The first message is umessage4:
from cot_messages_c6 import umessage4
该消息包含创建推广活动的说明。我们将消息存储在一个变量中,以便根据任务的不同,调用该函数时可以使用不同的提示信息。
The message contains instructions on how to create a promotional campaign. We are keeping the message in a variable so that the function can be called with different prompts depending on the task.
代理必须首先使用分析过的内存标签来生成文本,而不是分析文本:
The agent must first use the memory tags analyzed to generate, not analyze, a text:
umessage4 = """
1) Your task is to generate an engaging text for a customer based on a memory analysis of a text
2) The analysis of the text is provided in the following format: text segment, memory tags, dimension, sentiment score, and explanation
The text also contains the overall sentiment score and the list of memory tags in the text
3) Use no other memory tags than those provided to generate your engaging text
然后,代理程序会收到关于情感分析的指令:
Then, the agent receives instructions on the sentiment analysis:
4) Use the overall sentiment score to give the tone of your response
If the overall sentiment score is positive write an engaging text addressing each segment with its memory tag and sentiment score
If the overall sentiment score is negative analyze why and find ideas and solutions to find a way to satisfy the customer
If the overall sentiment score is negative analyze make sure to show empathy for this negative feeling and then make the transition from negative to positive
4) Focus on the topic provided that begins with the term the topic which focuses on the core topic of the text to make the customer happy
然后,代理接收到关于要生成的内容的最终指令:
Then, the agent receives final instructions on the content to generate:
5) Use your training to suggest named entities for that topic to make sure that the customer receives a message tailored to the memory tags and sentiment score
现在,我们将获得的缩放评分与代理找到的记忆标签相加,从而创建输入:
We now create the input by adding the scaled rating we obtained and the memory tags the agent found:
ugeneration=generation + "The advanced memory analysis of each segment of a text with a sentiment score:" + retres + " the scaled overall rating: "+ str(scaled_rating)+ " and the list of memory tags of the text "+ task_response
代理人现在它已经完整地表示了预期任务。我们用以下方式解释智能体的角色imcontent4:
The agent now has a complete representation of the task expected. We explain the agent’s role with imcontent4:
imcontent4 = "You are a marketing expert specialized in the psychological analysis of content"
代理现在已准备好通过make_openai_api_call调用来运行生成操作:
The agent is now ready to run the generation with the make_openai_api_call call:
ugeneration=generation + …
mrole4 = "system"
mcontent4 = imcontent4
user_role = "user"
pre_creation_response = make_openai_api_call(
ugeneration,mrole4,mcontent4,user_role
)
print(pre_creation_response)
如果评论内容带有负面情绪,则回应应表示同情;否则,回应应pre_creation_response与评论的语气相符:
The response is a pre_creation_response response that is empathetic if the sentiment is negative or adapts it to the tone of the review otherwise:
**Segment 7: "But we didn't get the view expected."**
Memory Tags: [Episodic Memory, Reality Memory]
Sentiment Score: 0.4
It's understandable to feel a bit let down when expectations aren't met. For future stays, …
输出结果采用认知格式。我们将运行相同的调用,但这次会添加一条消息,用于清理和准备内容以生成图像:
The output is in a cognitive format. We’re going to run the same call but with a message to clean up and prepare the content for image generation:
umessage4b="Clean and simplify the following text for use as a DALL-E prompt. Focus on converting the detailed analysis into a concise visual description suitable for generating an engaging promotional image" + pre_creation_response
mrole4b = "system"
mcontent4b = imcontent4b
user_role4b = "user"
creation_response = make_openai_api_call(
umessage4b,mrole4b,mcontent4b,user_role4b
)
print(creation_response)
输出结果是一份清晰的指令,指导如何创作一幅展现令人兴奋的奢华产品形象的图片,这样的产品总是会受到人们的喜爱:
The output is a clear instruction to create an image with an exciting luxurious offer that is always appreciated:
"Luxurious hotel stay with spacious rooms and swift check-in; enjoy a comfortable bed and convenient 24-hour parking. Celebrate with special deals and nearby shopping reminiscent of home. Despite minor noise and view issues, the overall experience is positive and memorable."
输出结果每次运行请求时,消息内容可能会有所不同,但语气应该保持一致。此外,我们还可以调整指令以生成其他内容。在本例中,代理已准备好使用此指令创建图像。
The output of the message may vary each time we run the requests, but the tone should remain the same. Also, we can adapt the instructions to other content to generate. In this case, the agent is all set to use this instruction to create an image.
在这一阶段,消费者记忆代理使用在步骤 4:内容创建期间生成的指令(creation_response)使用 OpenAI 的 DALL-E 创建定制的宣传图像:
At this stage, the consumer memory agent uses the instructions (creation_response) generated during Step 4: Content creation to create a tailored promotional image using OpenAI’s DALL-E:
# Step 5: Creating an image
import requests
prompt=creation_response
image_url = reason.generate_image(prompt)
该generate_image(prompt)函数复用自上一章。通过持续复用函数,我们可以降低开发开销并确保代码的可维护性。与第 5 章一样,图像生成后存储在文件中,如下所示: c_image.png
The generate_image(prompt) function is reused from the previous chapter. By consistently reusing functions, we reduce the development overhead and ensure code maintainability. As in Chapter 5, the image is generated and stored in a file as c_image.png:
save_path = "c_image.png"
image_data = requests.get(image_url).content
with open(save_path, "wb") as file:
file.write(image_data)
图片已准备就绪,将与最终的个性化信息一同展示。我们将在流程结束时显示该图片。
The image is now ready to accompany our final personalized message. We will display the image at the end of the process.
随着宣传图片已准备就绪,现在我们来生成一条简洁而引人入胜的客户信息。首先,我们确认creation_response步骤5:创建图片 中的各项功能已可用:
With the promotional image prepared, we now generate a concise and engaging customer message. First, we confirm that creation_response from Step 5: Creating an image is available:
if creation_response != "":
umessage = """
1) Read the following text carefully
2) Then sum it up in a paragraphs without numbering the lines
3) They output should be a text to send to a customer
"""
代理程序输出的信息经过润色,适合与客户沟通:
The output from the agent provides a polished message, suitable for customer communication:
Dear Customer,
Experience a luxurious hotel stay with spacious rooms and a swift check-in process. Enjoy a comfortable bed and the convenience of 24-hour parking. Take advantage of special deals and nearby shopping that feels like home. While there may be minor noise and view issues, the overall experience remains positive and memorable.
Best regards,
我们可以现在,如果需要,可以使用 Python 的以下函数以另一种格式显示输出textwrap:
We can now display the output in another format if we wish to, with Python’s textwrap:
import os
from IPython.display import Image, display
import textwrap
# Set the desired width for each line
line_width = 70
# Wrap the text to the specified width
wrapped_message = textwrap.fill(process_response, width=line_width)
print(wrapped_message)
显示的信息清晰、专业,适合直接与客户沟通:
The displayed message is clear, professional, and suitable for direct customer outreach:
Dear Customer, Experience a luxurious hotel stay with spacious rooms
and a swift check-in process. …
为了增强此信息,您可以调整提示,省略有关人工智能或任何内部系统细节的说明,而只关注以客户为中心的语言。
To enhance this message, you might adjust the prompt to omit references to AI or any internal system details, focusing purely on customer-oriented language.
最后,系统会将生成的图像与信息一起显示,从而创建一个吸引人的个性化宣传包:
Finally, the system displays the generated image alongside the message to create an appealing, personalized promotional package:
# Define the image path
image_path = "/content/c_image.png"
# Check if the image file exists
if os.path.exists(image_path):
# Display the image
display(Image(filename=image_path))
else:
print(f"Image file {image_path} not found.")
最终呈现的视觉效果突出了酒店升级后的豪华品质,与……完美契合。根据客户评价分析得出的期望:
The resulting visual emphasizes the hotel’s upgraded, luxurious offering, perfectly aligned with the customer’s expectations based on their review analysis:
图 6.6:为顾客升级酒店住宿
Figure 6.6: An upgrade for a stay in the hotel for a customer
现在您可以尝试添加更多评论,测试代理的深度和灵活性。我们已成功开发出一款受神经科学启发的复杂认知理论(CoT)消费者记忆代理。在下一节中,我们将把整个流程集成到reason.py人工智能库中,并进一步增强我们的 GenAISys 框架。
You can now experiment with additional reviews, testing the depth and flexibility of the agent. We have successfully developed a sophisticated, neuroscience-inspired CoT consumer memory agent. In the next section, we’ll integrate this full process into the reason.py AI library and further enhance our GenAISys framework.
我们的本章的探索之旅带领我们深入进入了自我反思、推理和元认知智能体的时代。在最后一节中,我们将从消费者记忆CoT的复杂内部运作机制转向简洁直观的用户体验。我们将添加一个CoT组件,使用户能够触发对任意文本的记忆分析或完整内容生成。然后,我们将扩展AI智能体,使其能够响应该组件的选项。最后,我们将以航班评论为例,演示通用工作流程,展示相同的记忆逻辑如何应用于新的领域。
Our journey in this chapter has taken us deeper into the era of self-reflecting, reasoning, and meta-cognitive agentic AI. In this final section, we shift from the intricate inner workings of our consumer-memory CoT to a clean, intuitive user experience. We’ll add a CoT widget that lets any user trigger memory analysis or full content generation on arbitrary text. We’ll then extend the AI agent so it reacts to that widget’s options. Finally, we’ll demonstrate the generalized workflow on a flight review to show how the same memory logic applies to new domains.
在 GitHub 上打开2_Running_the_Reasoning_GenAISys.ipynbnotebook。然后运行“设置环境”部分,该部分与第 5 章中的 notebook 完全相同。我们将首先向 IPython 接口添加一个 CoT 小部件。
Open the 2_Running_the_Reasoning_GenAISys.ipynb notebook on GitHub. Then run the Setting up the Environment section, which is identical to the notebook in Chapter 5. We will begin by adding a CoT widget to the IPython interface.
为了制造内存代理简单直观,我们引入了一个简洁的下拉菜单(图 6.6)。用户可以轻松选择希望 GenAISys 代理执行的任务:
To make the memory agent simple and intuitive, we introduce a straightforward drop-down menu (Figure 6.6). Users can effortlessly select the task they wish the GenAISys agent to perform:
这种简化的用户交互方式显著降低了最终用户的复杂性,将复杂的内部操作转移到了后台。
This streamlined user interaction significantly reduces complexity for the end user, shifting the sophisticated internal operations into the background.
图 6.7:选择推理任务
Figure 6.7: Choosing the reasoning task
该组件的实现分为三个步骤——添加组件、添加观察者以及将选项发送给 AI 代理:
The widget is implemented in three steps—adding the widget, adding an observer, and sending the options to the AI agent:
instruct_selector添加小部件只需几行代码。我们在 IPython 接口中
定义下拉菜单( ):# Ensure 'Instructions' exists in the memory_selector options
instruct_selector = Dropdown(
options=["None","Analysis", "Generation"],
value="None", # Ensure default active_memory is in the options
description='Reasoning:',
layout=Layout(width='50%')
)
instruct_selector) within the IPython interface:
# Ensure 'Instructions' exists in the memory_selector options
instruct_selector = Dropdown(
options=["None","Analysis", "Generation"],
value="None", # Ensure default active_memory is in the options
description='Reasoning:',
layout=Layout(width='50%')
)
下拉菜单提供清晰的选项,确保用户能够轻松理解他们的选择:无、分析或生成。接下来,我们将此功能整合instruct_selector到现有界面布局中(VBox):
The dropdown provides clear options, ensuring users easily understand their choices: None, Analysis, or Generation. Next, we incorporate instruct_selector into the existing interface layout (VBox):
VBox:
Box(
[user_selector, input_box, agent_checkbox,
tts_checkbox, files_checkbox,instruct_selector],
layout=Layout(display='flex', flex_flow='column',
align_items='flex-start', width='100%')
什么时候用户提交选择后,处理程序会使用标准提交代码更新输出消息,让用户看到该选择已被采纳:
When the user submits their choice, a handler updates the output messages for the user to see that the choice has been taken into account using standard submission code:
def handle_submit(sender):
user_message = sender.value
if user_message.strip():
sender.value = "" # Clear the input box
# Check if instruct_selector is "Analysis" or "Generation"
if instruct_selector.value in ["Analysis", "Generation"]:
with reasoning_output:
reasoning_output.clear_output(wait=True)
print("Thinking...") # Display "Thinking..." only when
print("Reasoning activated") # Restore default message…
我们希望显示“正在思考……”,以向用户表明系统正在运行。
We want “Thinking…” to be displayed to signal to the user that the system is working.
instruct_selector由以下方式调用instruct_selector.observe:
# Ensure 'Instructions' exists in the memory_selector options
instruct_selector = Dropdown(
options=["None","Analysis", "Generation"],
value="None", # Ensure default active_memory is in the options
description='Reasoning:',
layout=Layout(width='50%')
instruct_selector.observe(on_instruct_change, names='value')
instruct_selector is called by instruct_selector.observe:
# Ensure 'Instructions' exists in the memory_selector options
instruct_selector = Dropdown(
options=["None","Analysis", "Generation"],
value="None", # Ensure default active_memory is in the options
description='Reasoning:',
layout=Layout(width='50%')
instruct_selector.observe(on_instruct_change, names='value')
active_instruct)被集成到代理的执行路径中:
if agent_checkbox.value:
pfiles = 1 if files_checkbox.value else 0
active_instruct = instruct_selector.value # Retrieve the selected instruction
response = chat_with_gpt(user_histories[active_user],
user_message, pfiles, active_instruct)
active_instruct) is integrated into the agent’s execution path:
if agent_checkbox.value:
pfiles = 1 if files_checkbox.value else 0
active_instruct = instruct_selector.value # Retrieve the selected instruction
response = chat_with_gpt(user_histories[active_user],
user_message, pfiles, active_instruct)
清晰只需集成这几行代码,人工智能代理就能动态激活相应的推理模式,无需用户进行额外的复杂操作。现在我们可以进一步增强人工智能代理的功能。
By clearly integrating these few lines, the AI agent dynamically activates the appropriate reasoning mode without additional user complexity. We can now enhance the AI agent.
这AI代理现在将接收用户选择的小部件,并将其作为一个名为“widgets”的新参数传递给它,active_instruct该参数将触发其决策:
The AI agent will now receive the user’s widget selection in a new argument named active_instruct that will trigger its decisions:
def chat_with_gpt(messages, user_message,files_status,active_instruct):
当用户选择“分析”时,AI代理会触发先前构建的推理函数,make_openai_reasoning_call以执行记忆分析:
When the user selects Analysis, the AI agent triggers the previously built reasoning function, make_openai_reasoning_call, to perform memory analysis:
if active_instruct=="Analysis" and continue_functions==True:
from cot_messages_c6 import (
system_message_s1, generation,imcontent4,imcontent4b)
mrole= system_message_s1
reasoning_steps=reason.make_openai_reasoning_call(
user_message, mrole
)
aug_output=reasoning_steps
continue_functions=False
同样,当选择“生成”时,代理将使用自定义memory_reasoning_thread函数运行完整的内存代理工作流程:
Similarly, when Generation is selected, the agent runs the complete memory agent workflow using the custom memory_reasoning_thread function:
if active_instruct=="Generation" and continue_functions==True:
from cot_messages_c6 import (
system_message_s1, generation,imcontent4,imcontent4b)
reasoning_steps = reason.memory_reasoning_thread(
user_message,system_message_s1,
generation,imcontent4,imcontent4b
)
aug_output=reasoning_steps
continue_functions=False
通过这些简单的集成,GenAISys 界面现在可以透明地提供强大的推理功能。
With these straightforward integrations, the GenAISys interface now provides powerful reasoning capabilities transparently.
和通过增强的界面,用户可以轻松地将记忆代理推理应用于酒店评论以外的各种内容。让我们用一条关于航班的普通用户评论来说明这种灵活性。用户只需选择“分析”,然后提交航班评论,无需其他操作说明:
With the enhanced interface, users can easily apply memory agent reasoning to diverse content beyond hotel reviews. Let’s illustrate this flexibility with a general user comment regarding a flight. The user simply selects Analysis, then submits a flight review without additional instructions:
User01: The flight from Tokyo to Honolulu was nice in general but the food was awful!
GenAISys 可立即返回详细的、逐段的内存分析:
The GenAISys instantly returns a detailed, segment-by-segment memory analysis:
Agent: Segment 1: "The flight from Tokyo to Honolulu was nice in general"
- Memory Tags: [Reality Memory, Time Memory Past]
- Dimension: Emotional
- Sentiment Score: 0.7
- Explanation: This segment recounts a real past experience (a flight) and expresses a positive emotional evaluation ("nice"). The reference to a completed event classifies it under reality and past time memory…
用户随后重新输入句子,但这次勾选“生成”和“文件”选项,以便显示根据文本生成的图像。用户只需选择该选项、输入文本并提交即可。整个过程流畅无缝,用户无需任何额外操作,即可完成响应:正如我们在上一节中设计的那样,从记忆分析开始,到情感评分、内容生成,最终生成定制的客户信息和引人入胜的图像(参见图 6.8),前提Files是界面中已勾选该选项。
The user then reenters the sentence, but this time with the Generation option and the Files option checked, so that the image generated with the text will be displayed. All the user has to do is select the option, enter the text, and submit it. Once again, the experience is seamless, no additional instructions are required on the part of the user, and the response is complete: the fully analysis, process, and final customer message as we designed in the previous section—beginning with memory analysis, proceeding to sentiment scoring, content generation, and concluding with a tailored customer message and an engaging image (see Figure 6.8) that will be displayed if Files is checked in the interface:
Process: Performing memory and sentiment analysis.
Memory analysis result: Segment 1: "The flight from Tokyo to Honolulu was nice in general but the food was awful!"
- Memory Tags: [Reality Memory, Episodic Memory, Time Memory Past]
- Dimension: Emotional
- Sentiment Score: 0.4
- Explanation: This segment recounts a specific past travel event (a flight between two cities), embedding personal experience and subjective evaluation (positive about the flight overall and strongly negative about the food). The use of experiential details and judgment indicates that the experience is encoded as a real, episodic, past memory with an emotional emphasis, resulting in an overall slightly negative sentiment.
…
Dear Customer,
Experience a serene flight from Tokyo to Honolulu while enjoying a gourmet meal inspired by the renowned Chef Nobu Matsuhisa. Indulge in diverse and vibrant dishes crafted to enhance your journey.
Best regards,
图 6.8:与个性化信息相匹配的引人入胜的客户图片
Figure 6.8: An engaging customer image to match the personalized message
我们可以看到从用户角度来看,我们的 GenAISys 运行流畅。我们让用户感觉生成式人工智能的一切都很简单。当然,在实际项目中,我们需要投入资源尝试各种类型的文本,找出局限性,并解决问题以覆盖各种极端情况并优化输出。现在,让我们总结本章内容,并将 GenAISys 提升到一个新的水平。
We can see that from a user perspective, our GenAISys is running seamlessly. We are giving the user the illusion that everything in generative AI is simple. Of course, in a real-life project, we would have to spend resources trying all types of texts, finding the limitations, and solving the issues to cover edge cases and refine outputs. Let’s now sum up our journey in this chapter and take the GenAISys to yet another level.
本章将我们的 GenAISys 系统远远推向了超越传统人工智能的境界,进入了元认知和自我反思推理的领域。我们定义了一个实用记忆模型,该模型结合了基本类别(短期记忆、长期记忆、现实记忆、虚构记忆和时间记忆)以及语义和情景标签,并在此基础上叠加了智力、情感和生理维度。基于此框架,我们构建了一个六步 CoT 智能体,它可以解码每个评论片段,标记记忆类别,量化情感,并生成一个整体认知评分。基于认知特征和情感评分,该智能体生成了个性化的推广文本,并创建了与之匹配的 DALL-E 图片——最终将所有内容整合为一条精炼的客户信息。
This chapter pushed our GenAISys far beyond classical AI, into the realm of meta-cognitive, self-reflective reasoning. We defined a pragmatic memory model combining primary categories (short-term, long-term, reality, fiction, and time) with semantic and episodic tags, then layered intellectual, emotional, and physical dimensions on top. Using this framework, we built a six-step CoT agent that decodes each review segment, tags memory categories, quantifies sentiment, and produces an overall cognitive score. Based on the cognitive profile and sentiment score, the agent generated personalized promotional text and created a matching DALL-E image—then wrapped everything into a polished customer message.
新增的下拉菜单组件允许用户选择“无”、“分析”或“生成”,从而只需单击一下即可完成复杂的推理任务。在后台,AI 代理会将请求路由到独立的内存分析或完整的消费者内存工作流程。我们最终在一个航班评论中演示了该代理,结果表明它可以分析、评分并回复任何文本——将 GenAISys 的应用范围从酒店业扩展到更广泛的客户服务场景。
A new drop-down widget now lets users choose None, Analysis, or Generation, making sophisticated reasoning tasks a single-click experience. Behind the scenes, the AI agent routes requests to either a standalone memory analysis or the full consumer-memory workflow. We finally demonstrated the agent on a flight review, showing it can analyze, score, and respond to any text—extending GenAISys from hospitality into broader customer service scenarios.
凭借这些进展,GenAISys 已准备好进行实时、生产级的决策。下一章将重点介绍如何扩展 GenAISys 架构的功能,以实现即时、高吞吐量的 AI 操作。
With these advances, the GenAISys is ready for real-time, production-grade decision-making. The next chapter will focus on scaling the functionality of our GenAISys architecture for immediate, high-throughput AI operations.
新框架、演进的架构、研究成果发布、生产故障——AI_Distilled每周都会为从事 LLM 和 GenAI 系统实战的工程师和研究人员提供一份简报,帮您过滤掉各种干扰信息。立即订阅,即可免费获得电子书,以及每周更新的深度见解,助您保持专注,掌握最新资讯。
New frameworks, evolving architectures, research drops, production breakdowns—AI_Distilled filters the noise into a weekly briefing for engineers and researchers working hands-on with LLMs and GenAI systems. Subscribe now and receive a free eBook, along with weekly insights that help you stay focused and informed.
订阅请访问https://packt.link/TRO5B或扫描下方二维码。
Subscribe at https://packt.link/TRO5B or scan the QR code below.
DeepSeek -V3 技术报告于 2024 年 12 月发布,一个月后发布了DeepSeek-R1论文。一套完整的开源资源。此次发布在人工智能社区引起了巨大震动:Hugging Face 的下载量呈爆炸式增长,DeepSeek 应用登上应用商店排行榜榜首,新的 API 提供商一夜之间涌现。各国政府就暂停发布开源软件展开辩论,而主要的生成式人工智能厂商——OpenAI、X(及其 Grok 3)等——则加速推进。短短几周内,我们就看到 o3 版本改进了 OpenAI 模型,这清晰地表明人工智能竞赛已进入新阶段。与此同时,现实世界的人工智能生产团队目睹了这些令人眼花缭乱的创新层出不穷,颠覆了现有的人工智能系统。那些花费数月时间将系统适配到某个生成式人工智能模型的团队,发现自己陷入了系统功能尚可但仍有改进空间的灰色地带。
The DeepSeek-V3 Technical Report arrived in December 2024, followed a month later by the DeepSeek-R1 paper and a full set of open source resources. The release sent a shockwave through the AI community: download counts on Hugging Face exploded, DeepSeek apps topped store charts, and new API providers sprang up overnight. Governments debated moratoriums while the major generative AI players—OpenAI, X (with Grok 3), and others—stepped on the gas. Within weeks, we saw o3 versions improve OpenAI models, a clear signal that the AI race had entered a new phase. At the same time, real-world AI production teams watched these dizzying innovations pile up, disrupting existing AI systems. Teams that spent months adapting their systems to one generative AI model found themselves caught in a gray area between systems that work but could still be improved.
那么,我们应该怎么做?我们是否应该升级一个稳定的GenAISys系统,以跟上快速发展的AI市场的最新趋势,尽管这意味着要承担相应的成本和风险?或者,如果我们的系统稳定,是否应该忽略最新的模型?如果我们忽视技术发展,我们的系统可能会过时。如果我们一味追随潮流,我们的系统又会变得不稳定!
So, what should we do? Should we upgrade a stable GenAISys to follow the latest trend in an accelerating AI market with the cost and risks entailed? Or should we ignore the latest models if our system is stable? If we ignore evolutions, our system may become obsolete. If we keep following the trends, our system will become unstable!
本章将展示如何找到一个可行的平衡点。我们无需为每次模型升级或新增功能而重写整个环境,而是……引入一种处理程序选择机制,将用户请求在正确的时间路由到正确的工具。处理程序注册表存储我们开发的每个 AI 功能,选择层都会检查每条传入消息并触发相应的处理程序。凭借这种设计,GenAISys 可以无限演进而不会破坏堆栈的稳定性。本章首先将通过产品设计和生产用例来阐述如何在模型演进和实际应用之间找到平衡点。接下来,我们将简要介绍 DeepSeek-V3、DeepSeek-R1 以及我们将要实现的精简版 Llama 模型。然后,我们将使用 Hugging Face 在本地安装DeepSeek-R1-Distill-Llama-8B,并将其封装到一个可重用的函数中。然后将其接入我们的 GenAISys 系统。届时,我们将构建灵活、可扩展的处理程序选择机制环境,以便为每个项目激活所需的模型和任务。在本章结束时,您将能够完全掌控 GenAISys 系统,并为应对人工智能市场带来的任何挑战做好准备。
This chapter shows how to strike a workable balance. Instead of rewriting entire environments for every model upgrade or new functionality, we introduce a handler-selection mechanism that routes user requests to the right tool at the right time. A handler registry stores every AI function we develop; the selection layer inspects each incoming message and triggers the appropriate handler. With this design, the GenAISys can evolve indefinitely without destabilizing the stack. We will begin the chapter by defining how a balanced approach can be found between model evolutions and real-world usage, illustrated through a product design and production use case. Next comes a concise look at DeepSeek-V3, DeepSeek-R1, and the distilled Llama model we’ll implement. Then, we’ll install DeepSeek-R1-Distill-Llama-8B locally with Hugging Face, wrap it in a reusable function, and then plug it into our GenAISys. At that point, we will develop the flexible, scalable environment of the handler-selection mechanism to allow us to activate the models and tasks we need for each project. By the end of the chapter, you will be able to have full control over the GenAISys and be ready for whatever the AI market throws at you.
本章涵盖以下主题:
This chapter covers the following topics:
我们首先来界定人工智能持续发展与日常业务应用之间的平衡点。
Let’s start by defining the balance between relentless AI evolution and day-to-day business usage.
在盲目采用每一种新模型之前,我们必须将决策建立在项目需求之上。到目前为止,我们的 GenAISys此前,该职位主要负责一家在线旅行社的营销工作。现在,假设这家旅行社发展壮大,足以推出一系列品牌商品——定制旅行包、小册子和其他小礼品。为了管理这项新业务,公司聘请了一位产品设计师兼生产经理(PDPM)。PDPM 研究客户反馈并设计个性化套件,但很快发现人工智能可以提高创造力和产量。
Before racing to adopt every new model, we must anchor our decisions on project needs. So far, our GenAISys has served mostly marketing functions for an online travel agency. Now, imagine that the agency has grown large enough to fund a line of branded merchandise—custom travel bags, booklets, and other goodies. To manage this new venture, the company hires a product designer and production manager (PDPM). The PDPM studies customer feedback and designs personalized kits but quickly sees that AI could boost both creativity and throughput.
因此,本章的示例主要集中在产品设计和生产工作流程上。我们的目标并非强制所有任务都使用 DeepSeek(或其他任何模型),而是选择最符合需求的模型。为此,我们将扩展 GenAISys,添加一个处理器选择机制,该机制能够响应 IPython 界面中的用户选择以及每条消息中的关键字。根据具体情况,运维团队可以配置系统,将请求路由到 GPT-4o、DeepSeek 或任何未来的模型。
The examples in this chapter thus focus on product design and production workflows. Our goal is not to force DeepSeek (or any other model) into every task but to choose the model that best fits the need. To do that, we’ll extend the GenAISys with a handler-selection mechanism that responds to user choices in the IPython interface and to keywords in each message. Depending on the situation, the operations team can configure the system to route requests to GPT-4o, DeepSeek, or any future model.
在将 DeepSeek 连接到我们的 GenAISys 之前,让我们回顾一下 DeepSeek 型号系列。
Before wiring DeepSeek into our GenAISys, let’s review the DeepSeek model family.
DeepSeek 的发展历程始于 DeepSeek-V3,随后发展到 DeepSeek-R1(一个以推理为重点的升级版本),然后衍生出基于 Qwen 和 Llama 架构的精简版本,如图7.1所示。V3 使该模型声名鹊起,而 R1 则引入了强大的推理功能。
DeepSeek’s journey began with DeepSeek-V3, advanced to DeepSeek-R1—a reasoning-focused upgrade—and then branched into distilled variants built on Qwen and Llama architectures, as shown in Figure 7.1. V3 was responsible for putting the model on the map, and it was R1 that brought in robust reasoning.
图 7.1:DeepSeek 开发周期
Figure 7.1: DeepSeek development cycle
根据 DeepSeek-AI 等人 (2024) 的研究,V3 模型实现了显著的效率提升。其完整的训练预算仅为 278.8 万 H800 GPU 小时(按每 GPU 小时 2 美元计算,约合 560 万美元)——对于现代前沿模型而言,这一成本非常低。即使按每个令牌计算,成本也很低,每万亿个令牌仅需 18 万 GPU 小时。因此,与通常报道的大规模模型相比,其成本非常经济。
According to DeepSeek-AI et al. (2024), V3 delivered striking efficiency gains. Its full training budget was only 2.788 million H800 GPU-hours (≈ USD 5.6 million at USD 2 per GPU-hour)—remarkably low for a modern frontier model. Even on a per-token basis, the cost is lean, needing just 180 K GPU-hours per trillion tokens. The cost is, therefore, very economical compared to what is typically reported for large-scale models.
当我们查看arXiv上DeepSeek-V3技术报告(2024)的作者列表(https://arxiv.org/abs/2412.19437 )时,首先注意到该报告由超过150位专家撰写!仅此一点就足以证明开源方法的有效性:它通过向所有愿意贡献想法的人开放,汇集集体力量,共同构建以效率为导向的架构。附录A中的贡献与致谢列表正是对开源发展的致敬。
When we examine the list of authors of the DeepSeek-V3 Technical Report (2024) on arXiv, https://arxiv.org/abs/2412.19437, we first notice that more than 150 specialists wrote the paper! In itself, this factor alone proves the efficiency of open source approaches that involve collective efforts to produce efficiency-driven architectures by opening ideas to every person willing to contribute. The list of Contributions and Acknowledgements in Appendix A is a tribute to open source developments.
图 7.2:DeepSeek-R1 由 DeepSeek-V3 衍生而来
Figure 7.2: DeepSeek-R1 is derived from DeepSeek-V3
DeepSeek-R1 直接源自 DeepSeek-V3。团队希望保留 V3 的强大功能,同时降低推理的复杂性,因此他们设计模型在推理过程中仅激活一个最小的专家子集,如图7.2所示。此外,训练过程也保持了同样的精简。R1 直接进入强化学习阶段,无需任何监督式微调。虽然推理能力很强,但在经典的自然语言处理 (NLP) 任务中仍存在局限性。为了避免神经网络的训练循环,团队引入了基于规则的奖励机制。训练提示语使用清晰的<think> … <answer>标签进行结构化处理,避免将偏见带入模型的最终答案。此外,强化学习过程从包含推理链( CoT ) 示例的冷启动数据开始。这种方法降低了训练时间和成本。
DeepSeek-R1 grew straight out of DeepSeek-V3. The team wanted V3’s punch, but with feather-weight inference, so they wired the model to activate only a minimal subset of experts during inference, as shown in Figure 7.2. Furthermore, training stayed just as lean. R1 jumped directly into reinforcement learning with no supervised fine-tuning. The reasoning was high but faced limitations for classic NLP tasks. Rule-based rewards were introduced to avoid the neural network’s training cycles. The training prompts were structured with neat <think> … <answer> tags, avoiding the smuggling of biases into the model’s final answer. Moreover, the reinforcement learning process began with cold-start data containing chain of thought (CoT) examples focusing on reasoning. This approach reduced training time and costs.
DeepSeek 通过改进 MoE 策略并集成多标记预测,演化为 R1,显著提升了准确率和效率。最终,DeepSeek-R1 被用于增强 DeepSeek-V3 的推理功能。DeepSeek-R1 也被提炼成更小的模型,例如 Llama 和 Qwen。所采用的技术是知识蒸馏,即一个更小的“学生”模型(本章中为 Llama)向一个“教师”模型(本章中为 DeepSeek-R1)学习。这种方法的有效性在于,它能够教会学生模型达到与教师模型相似的性能,同时效率更高,更适合部署在资源受限的设备上,正如您将在本章中看到的那样。
DeepSeek evolved to R1 by refining MoE strategies and integrating multi-token prediction, significantly enhancing both accuracy and efficiency. Finally, DeepSeek-R1 was used to enhance DeepSeek-V3 with reasoning features. DeepSeek-R1 was also distilled into smaller models such as Llama and Qwen. The technique used was knowledge distillation, where a smaller “student” model (in this chapter, Llama) learns from a “teacher” model (in this chapter, DeepSeek-R1). This approach is effective in that it teaches the student model to achieve performance similar to that of the teacher while being more efficient and suitable for deployment on resource-constrained devices, which will be the case in this chapter, as you’ll see.
让我们安装并运行 DeepSeek-R1-Distill-Llama-8B,并将其插入我们的 GenAISys 中。
Let’s install and run DeepSeek-R1-Distill-Llama-8B and plug it into our GenAISys.
在此在本部分中,我们将实现 DeepSeek-RAI-Distill-Llama-8B,它是 DeepSeek-R1 的一个精简版本,如图 7.3所示。我们将安装 Hugging Face 的开源Transformers库,这是一个用于使用和微调预训练 Transformer 模型的开放框架。
In this section, we will implement DeepSeek-RAI-Distill-Llama-8B, a distilled version of DeepSeek-R1, as shown in Figure 7.3. We will install Hugging Face’s open-source Transformers library, an open framework for using and fine-tuning pre-trained transformer models.
图 7.3:安装 DeepSeek-RAI-Distill-Llama-8B,它是 DeepSeek-R1 的精简版。
Figure 7.3: Installing DeepSeek-RAI-Distill-Llama-8B, a distilled version of DeepSeek-R1
我们将使用 Hugging Face 提供的 DeepSeek-RAI-Distill-Llama-8B 模型:https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B。Hugging Face 还提供了该模型的使用建议:https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B#usage-recommendations。
We will be using the DeepSeek-RAI-Distill-Llama-8B documented by Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B. Hugging Face also provides recommendations for this model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B#usage-recommendations.
我们将下载的是由 Unsloth(一家 LLM 加速器)在 Hugging Face 上提供的 DeepSeek-R1 开源精简版:https://unsloth.ai/。因此,我们不会使用 DeepSeek API,而只会使用本地安装的开源版本,该版本不与网络交互,并利用了 Hugging Face 的 SOC 2 Type 2 认证,符合隐私和安全约束:https://huggingface.co/docs/inference-endpoints/en/security。
The version we will download is an open source distilled version of DeepSeek-R1 provided by Unsloth, an LLM accelerator, on Hugging Face: https://unsloth.ai/. We will thus not use a DeepSeek API but only a locally installed open source version that does not interact with the web, leveraging Hugging Face’s SOC 2 Type 2 certification that complies with privacy and security constraints: https://huggingface.co/docs/inference-endpoints/en/security.
安装在较新的机器上本地运行 deepseek-ai/DeepSeek-R1-Distill-Llama-8B 时,建议配备约 20 GB 的内存。内存可以略少一些,但最好避免风险。同时,也建议配备约 20 GB 的磁盘空间。
To install deepseek-ai/DeepSeek-R1-Distill-Llama-8B locally on a recent machine, it is recommended to have about 20 GB of RAM. A bit less is possible, but it is best to avoid the risk. About 20 GB of disk space is also recommended.
要在 Google Colab 上安装 DeepSeek-R1-Distill-Llama-8B,建议使用 Google Colab Pro 以获取足够的 GPU 内存和算力。本部分中,Hugging Face 模型已从 Google 云端硬盘下载,并通过 Google Colab 挂载。所需磁盘空间将超出免费版 Google 云端硬盘的容量,可能需要订阅 Google 云端硬盘的最低费用。请在 Google Colab 上安装前确认费用。
To install DeepSeek-R1-Distill-Llama-8B on Google Colab, it is recommended to use Google Colab Pro to obtain GPU memory and power. For this section, the Hugging Face model is downloaded on Google Drive, which is mounted through Google Colab. The disk space required will exceed the free version of Google Drive, and a minimal subscription to Google Drive may be required. Check the costs before installing on Google Colab.
打开Getting_started_with_DeepSeek_R1_Distill_Llama_8B.ipynbGitHub 上的 Chapter07 目录(https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main)。我们将遵循 Hugging Face 框架的标准流程:
Open Getting_started_with_DeepSeek_R1_Distill_Llama_8B.ipynb within the Chapter07 directory on GitHub (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main). We will follow the standard procedure of the Hugging Face framework:
install_deepseek=True
install_deepseek=True
install_deepseek=False
install_deepseek=False
有了模型,我们就可以将其封装在一个处理程序中,并在下一节中将其插入到我们的 GenAISys 中。
With the model in place, we can wrap it in a handler and plug it into our GenAISys in the next section.
我们将从……开始安装 DeepSeek-R1-Distill-Llama-8B(本地或 Colab 中),然后运行快速推理以确认一切正常。
We’ll begin by installing DeepSeek-R1-Distill-Llama-8B (locally or in Colab) and then run a quick inference to confirm everything works.
我们将在第一次会话中首先安装 DeepSeek:
We will first install DeepSeek in the first session:
# Set install_deepseek to True to download and install R1-Distill-Llama-8B locally
# Set install_deepseek to False to run an R1 session
install_deepseek=True
GPU需要激活,我们来检查一下:
The GPU needs to be activated, so let’s check it:
Checking GPU activation
!nvidia-smi
如果我们正在安装 Google Colab,我们可以挂载 Google 云端硬盘:
If we are installing Google Colab, we can mount Google Drive:
from google.colab import drive
drive.mount('/content/drive')
我们现在在 Google 云端硬盘中设置缓存目录,并设置相应的环境变量:
We now set the cache directory in Google Drive and set the corresponding environment variables:
import os
# Define the cache directory in your Google Drive
cache_dir = '/content/drive/MyDrive/genaisys/HuggingFaceCache'
# Set environment variables to direct Hugging Face to use this cache directory
os.environ['TRANSFORMERS_CACHE'] = cache_dir
#os.environ['HF_DATASETS_CACHE'] = os.path.join(cache_dir, 'datasets')
现在我们可以安装 Hugging FaceTransformers库了:
We can now install the Hugging Face Transformers library:
!pip install transformers==4.48.3
这样一来,我们就可以下载模型了。
With that, we are ready to download the model.
现在unsloth/DeepSeek-R1-Distill-Llama-8B从Hugging Face 框架内下载包含分词器和模型的模型:
Let’s now download the model from unsloth/DeepSeek-R1-Distill-Llama-8B within the Hugging Face framework with the tokenizer and the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
import time
if install_deepseek==True:
# Record the start time
start_time = time.time()
model_name = 'unsloth/DeepSeek-R1-Distill-Llama-8B'
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name, device_map='auto', torch_dtype='auto'
)
# Record the end time
end_time = time.time()
# Calculate the elapsed time
elapsed_time = end_time - start_time
print(f"Time taken to load the model: {elapsed_time:.2f} seconds")
下载时间将会显示,具体时间取决于您的网络连接速度和 Hugging Face 的下载速度。安装完成后,请确认所有程序都已安装到您的本地目录中。在此过程中,请确认所有程序都已安装到您的本地目录中。情况如下:
The download time will be displayed and also depends on your internet connection and Hugging Face’s download speed. Once installed, verify that everything is installed in your local directory. In this case, it is as follows:
if install_deepseek==True:
!ls -R /content/drive/MyDrive/genaisys/HuggingFaceCache
输出结果应显示已下载的文件:
The output should show the files downloaded:
/content/drive/MyDrive/genaisys/HuggingFaceCache:
models--unsloth--DeepSeek-R1-Distill-Llama-8B version.txt
/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B:
blobs refs snapshots
/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B/blobs:
03910325923893259d090bfa92baa4088cd46573…
现在,让我们运行一次 DeepSeek 会话。
Now, let’s run a DeepSeek session.
为了确保模型已正确安装。为了避免在启动新会话时覆盖已安装的模型,请返回笔记本顶部并进行以下设置:
To make sure the model is correctly installed and also to avoid overwriting the installation when starting a new session, go back to the top of the notebook and set the following:
install_deepseek=False
现在我们将把DeepSeek-R1-Distill-Llama-8B分词器和模型加载到本地:
We will now load the DeepSeek-R1-Distill-Llama-8B tokenizer and model locally:
import time
from transformers import AutoTokenizer, AutoModelForCausalLM
if install_deepseek==False:
# Define the path to the model directory
model_path = '/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B/snapshots/71f34f954141d22ccdad72a2e3927dddf702c9de'
# Record the start time
start_time = time.time()
# Load the tokenizer and model from the specified path
tokenizer = AutoTokenizer.from_pretrained(
model_path, local_files_only=True
)
model = AutoModelForCausalLM.from_pretrained(
model_path, device_map='auto', torch_dtype='auto',
local_files_only=True
)
# Record the end time
end_time = time.time()
# Calculate the elapsed time
elapsed_time = end_time - start_time
print(f"Time taken to load the model: {elapsed_time:.2f} seconds")
The time it took to load the model is displayed and will depend on the configuration of your machine:
Time taken to load the model: 14.71 seconds
我们可以看一下羊驼模型的配置:
We can have a look at the configuration of the Llama model:
if install_deepseek==False:
print(model.config)
输出结果显示了一些有趣的信息。LlamaConfig读数证实我们正在运行一个结构紧凑、范围明确的模型:
The output shows interesting information. The LlamaConfig readout confirms we are running a compact, well-scoped model:
LlamaConfig {
"_attn_implementation_autoset": true,
"_name_or_path": "/content/drive/MyDrive/genaisys/HuggingFaceCache/models--unsloth--DeepSeek-R1-Distill-Llama-8B/snapshots/71f34f954141d22ccdad72a2e3927dddf702c9de",
"architectures": [
"LlamaForCausalLM"
],
…
精简后的 Llama 模型有 32 个 Transformer 层,每层 32 个注意力头,总共 1024 个注意力头。此外,它还包含 80 亿个参数。相比之下,它的教师模型DeepSeek-R1是一个 MoE 巨兽,拥有61 层和惊人的6710 亿个参数,其中约有370 亿个参数在每次前向传播中处于激活状态。现在,让我们运行一个示例,并提示一个生产环境中的问题:
The distilled Llama model has 32 transformer layers and 32 attention heads per layer, totaling 1,024 attention heads. Also, it contains 8 billion parameters. By contrast, its teacher model, DeepSeek-R1, is an MoE giant with 61 layers and a massive 671 billion parameters, of which about 37 billion are active on each forward pass. Let’s now run an example with a prompt for a production issue:
if install_deepseek==False:
prompt="""
Explain how a product designer could transform customer requirements for a traveling bag into a production plan.
"""
我们首先插入时间测量信息,并使用 GPU 对输入进行标记化:
We first insert time measurement and tokenize the input using the GPU:
import time
if install_deepseek==False:
# Record the start time
start_time = time.time()
# Tokenize the input
inputs = tokenizer(prompt, return_tensors='pt').to('cuda')
然后,我们运行生成过程:
Then, we run the generation:
# Generate output with enhanced anti-repetition settings
outputs = model.generate(
**inputs,
max_new_tokens=1200,
repetition_penalty=1.5, # Increase penalty to 1.5 or higher
no_repeat_ngram_size=3, # Prevent repeating n-grams of size 3
temperature=0.6, # Reduce randomness slightly
top_p=0.9, # Nucleus sampling for diversity
top_k=50 # Limits token selection to top-k probable tokens
)
The goal of our parameters is to limit the repetitions and remain focused:
max_new_tokens=1200限制输出标记的数量max_new_tokens=1200: To limit the number of output tokensrepetition_penalty=1.5为了限制重复次数(可以更高)repetition_penalty=1.5: To limit the repetitions (can be higher)no_repeat_ngram_size=3为了防止重复出现特定大小的 n-gram。no_repeat_ngram_size=3: To prevent repeating n-grams of a particular sizetemperature=0.6为了减少随机性并保持专注temperature=0.6: To reduce randomness and stay focusedtop_p=0.9允许进行细胞核取样以评估多样性top_p=0.9: Allows nucleus sampling for diversitytop_k=50限制令牌选择范围,以便top_k进行下一次令牌选择top_k=50: Limits token selection to top_k to make the next token choice这组标记既能减少重复,又能保证多样性。现在我们可以使用分词器解码生成的文本:
This set of tokens tends to limit repetitions while allowing diversity. We can now decode the generated text with the tokenizer:
# Decode and display the output
generated_text = tokenizer.decode(
outputs[0], skip_special_tokens=True
)
# Record the end time
end_time = time.time()
# Calculate the elapsed time
elapsed_time = end_time - start_time
print(f"Time taken to load the model: {elapsed_time:.2f} seconds")
输出结果显示了模型思考和响应所花费的总时间:
The output shows the overall time it took the model to think and respond:
Time taken to load the model: 20.61 seconds
让我们把generated_text它包装起来展示出来:
Let’s wrap generated_text and display it:
import textwrap
if install_deepseek==False:
wrapped_text = textwrap.fill(generated_text, width=80)
print(wrapped_text)
这输出结果提供了所需的思路。它展示了 DeepSeek-R1 的思考能力:
The output provides ideas as requested. It displays DeepSeek-R1’s thinking abilities:
…Once goals & priorities become clearer, developing
prototypes becomes more focused since each iteration would aim at testing one main feature rather than multiple changes simultaneously—which makes refining individual elements easier before moving towards finalizing designs, When prototyping starts: 1) Start with basic functional mockups using simple tools –…
在本节中,我们我们将分几步把 DeepSeek-R1-Distill-Llama-8B 添加到我们的 GenAISys 中。打开GenAISys_DeepSeek.ipynb。您可以选择在第一个单元格中使用 DeepSeek 运行笔记本,这将需要 GPU:
In this section, we will add DeepSeek-R1-Distill-Llama-8B to our GenAISys in a few steps. Open GenAISys_DeepSeek.ipynb. You can decide to run the notebook with DeepSeek in the first cell, which will require a GPU:
# DeepSeek activation deepseek=True to activate. 20 Go (estimate) GPU memory and 30-40 Go Disk Space
deepseek=True
您也可以选择不在本笔记本中运行 DeepSeek,在这种情况下,您将不需要 GPU,并且可以将运行时更改为 CPU。如果您选择此选项,OpenAI 的 API 将接管,并确认不需要 GPU:
You can also decide not to run DeepSeek in this notebook, in which case, you will not need a GPU and can change the runtime to CPU. If you decide on this option, OpenAI’s API will take over, confirming that no GPU is required:
deepseek=False
现在,请转到笔记本的“设置 DeepSeek Hugging Face 环境”子章节。我们将把以下单元格从原章节复制Getting_started_with_DeepSeek_R1_Distill_Llama_8B.ipynb到该子章节。以下代码仅在满足以下条件时才会激活deepseek=True:
Now, go to the Setting up the DeepSeek Hugging Face environment subsection of the notebook. We will simply transfer the following cells from Getting_started_with_DeepSeek_R1_Distill_Llama_8B.ipynb to this subsection. The following code will only be activated if deepseek=True:
!nvidia-smi!nvidia-smi…os.environ['TRANSFORMERS_CACHE'] =cache_dir……os.environ['TRANSFORMERS_CACHE'] =cache_dir…!pip install transformers==4.48.3!pip install transformers==4.48.3 from transformers import AutoTokenizer, AutoModelForCausalLM
# Define the path to the model directory
model_path = …
from transformers import AutoTokenizer, AutoModelForCausalLM
# Define the path to the model directory
model_path = …
这安装现已完成。如果参数符合“运行 DeepSeek-R1-Distill-Llama-8B 会话”部分所述,则会在“AI 函数”部分调用 DeepSeek 模型:DeepSeek==True
The installation is now complete. The calls to the DeepSeek model will be made in the AI Functions section if DeepSeek==True with the parameters described in the Running a DeepSeek-R1-Distill-Llama-8B session section:
if models == "DeepSeek":
# Tokenize the input
inputs = tokenizer(sc_input, return_tensors='pt').to('cuda')
….
task_response =tokenizer.decode(outputs[0],skip_special_tokens=True)
DeepSeek 运行正常后,我们就可以构建处理程序选择机制了,该机制会将每个用户请求路由到 GPT-4o、DeepSeek 或任何未来的模型,而不会触及堆栈的其余部分。
With DeepSeek functioning, we’re ready to build the handler selection mechanism, which will route every user request to GPT-4o, DeepSeek, or any future model—without touching the rest of the stack.
PDPM这家在线旅行社的需求不断增长,需要设计和生产大量的商品套装,包括旅行包、小册子和笔。PDPM希望直接参与GenAISys的开发,探索如何利用该系统显著提高生产效率。
The PDPM at the online travel agency is experiencing increased demands, requiring the agency to design and produce large quantities of merchandise kits, including travel bags, booklets, and pens. The PDPM wants to be directly involved in the GenAISys development to explore how it can significantly boost productivity.
鉴于系统中人工智能任务的复杂性和多样性不断增加,GenAISys 开发团队决定使用处理程序来组织这些任务,如图7.4所示:
Given the growing complexity and variety of AI tasks in the system, the GenAISys development team has decided to organize these tasks using handlers, as illustrated in Figure 7.4:
图 7.4:GenAISys 数据流和组件交互
Figure 7.4: GenAISys data flow and component interaction
因此,我们将定义、实施增强型 GenAISys,然后邀请 PDPM 运行该系统进行评估。旨在提高商品设计和生产效率的功能。
We’ll, therefore, define, implement, and then invite the PDPM to run the enhanced GenAISys to evaluate functions aimed at improving productivity in merchandise design and production.
图 7.4描述了我们将要实现的处理程序管道的行为:
Figure 7.4 describes the behavior of the handler pipeline we are going to implement:
在此设置中,用户通过 IPython 接口提供输入。该输入会被路由到一个处理器选择机制,该机制会评估已注册的、与特定条件对应的可用处理器。注册表中的每个条目都是一个(条件,处理器)对,负责不同的操作,例如推理、图像生成或数据分析。一旦找到匹配的条件,相应的 AI 函数就会被激活。处理完成后,AI 会将结果返回给接口。从用户输入到 AI 生成的响应,整个结构化的流程处理得非常流畅,每个处理器都经过清晰定义,以提高代码的可读性和效率。
In this setup, a user provides input through the IPython interface. This input is routed into a handler selection mechanism, which then evaluates the available handlers registered alongside specific conditions. Each entry in the registry is a (condition, handler) pair responsible for different operations such as reasoning, image generation, or data analysis. Once a matching condition is found, the corresponding AI function is activated. After processing, it returns the results to the interface. This structured pipeline—from user input through to the AI-generated response—is handled gracefully, with each handler clearly defined for readability and efficiency.
在编写代码之前,让我们先明确定义 GenAISys 中的“处理程序”的含义。
Before coding, let’s clearly define what we mean by a “handler” in the GenAISys.
处理程序是本质上,每个处理程序都是一个专门负责处理特定任务或请求类型的函数。每个处理程序都注册了一个条件,通常是一个小函数或 lambda 表达式。当该条件被评估为 true 时,就表示应该调用关联的处理程序。这种设计巧妙地将决定运行哪个True处理程序的逻辑与处理程序执行任务的方式解耦。
A handler is essentially a specialized function responsible for addressing specific tasks or types of requests. Each handler is registered alongside a condition, typically a small function or lambda expression. When evaluated as True, this condition indicates that the associated handler should be invoked. This design neatly decouples the logic for deciding which handler should run from how the handler executes its task.
在我们的场景中,处理程序是编排器的基本构建模块——它们是用于处理特定输入类型的条件函数。当用户提供输入时,处理程序选择机制会将其与处理程序注册表进行比对,该注册表包含条件和处理程序的配对。一旦找到匹配项,相应的处理程序就会被触发,并调用诸如handle_generation` handle_analysisget_ handle_pinecone_rag...
In our context, handlers are the orchestrator’s building blocks—conditional functions designed to process specific input types. When a user provides input, the handler selection mechanism evaluates it against the handler registry, which consists of pairs of conditions and handlers. Upon finding a match, the corresponding handler is triggered, invoking specialized functions such as handle_generation, handle_analysis, or handle_pinecone_rag. These handlers execute sophisticated reasoning, data retrieval, or content generation tasks, providing precise and targeted outputs.
但是,对于我们的 GenAISys 系统来说,处理程序究竟比传统的条件列表有什么优势呢if…then?
But why exactly is a handler better for our GenAISys than a traditional list of if…then conditions?
使用处理程序提高了代码的可维护性和可读性。不再将多个if...then检查分散在代码各处,每个处理程序都是独立的:它包含自己的条件和一个执行所需操作的独立函数。这种结构使得添加、删除或修改处理程序变得更加容易,避免了在冗长的条件语句链中出现意外交互的风险。此外,由于它将“我们需要哪个处理程序?”的逻辑与“该处理程序如何实际工作?”的逻辑分离,我们得到了一个更加模块化的设计,从而实现了无缝扩展。
Using handlers improves maintainability and readability. Instead of scattering multiple if...then checks across the code, each handler is self-contained: it has its condition and a separate function that carries out the required action. This structure makes it easier to add, remove, or modify handlers without risking unintended interactions in a chain of lengthy conditionals. Additionally, since it separates the logic of “which handler do we need?” from “how does that handler actually work?” we’re left with a more modular design that makes scaling seamless.
我们将首先介绍对 IPython 接口的修改。
We will first go through the modifications to our IPython interface.
我们将从……开始我们回顾了 IPython 接口的主要更新,该接口仍然是主要的交互点,如图7.5所示。从用户角度来看,引入处理程序并不会显著改变界面,但需要对一些底层代码进行调整。
We’ll start by reviewing the primary updates to our IPython interface, which remains the main interaction point, as shown in Figure 7.5. From a user perspective, the introduction of handlers doesn’t alter the interface significantly, but some underlying code adjustments are necessary.
图 7.5:IPython 接口处理用户输入并显示输出
Figure 7.5: The IPython interface processes the user input and displays the output
IPython接口调用方式chat_with_gpt与之前相同:
The IPython interface calls chat_with_gpt as before:
response = chat_with_gpt(
user_histories[active_user], user_message, pfiles,
active_instruct, models=selected_model
)
然而,现在我们可以通过以下方式明确选择 OpenAI 模型或 DeepSeek 模型:
Now, however, we can explicitly select either an OpenAI or a DeepSeek model with the following:
models=selected_model
要将模型添加到chat_with_gpt调用中,我们首先向界面添加一个下拉模型选择器:
To add the model to the chat_with_gpt call, we first add a drop-down model selector to the interface:
# Dropdown for model selection
model_selector = Dropdown(
options=["OpenAI", "DeepSeek"],
value="OpenAI",
description="Model:",
layout=Layout(width="50%")
)
模型选择器已添加到VBox接口中的实例中:
The model selector is added to the VBox instances in the interface:
# Display interactive widgets
display(
VBox(
[user_selector, input_box, submit_button, agent_checkbox,
tts_checkbox, files_checkbox, instruct_selector,
model_selector],
layout=Layout(display='flex', flex_flow='column',
align_items='flex-start', width='100%')
)
)
用户现在可以直接从界面选择自己喜欢的型号,如下图所示:
The user can now choose their preferred model directly from the interface, as shown here:
图 7.6:选择模型
Figure 7.6: Selecting a model
小贴士:需要查看此图片的高分辨率版本吗?请在新一代 Packt 阅读器中打开本书,或在 PDF/ePub 版本中查看。
Quick tip: Need to see a high-resolution version of this image? Open this book in the next-gen Packt Reader or view it in the PDF/ePub copy.
购买本书即可免费获赠新一代 Packt Reader 阅读器。扫描二维码或访问packtpub.com/unlock,然后使用搜索栏按书名查找本书。请仔细核对显示的版本,确保您购买的是正确的版本。
The next-gen Packt Reader is included for free with the purchase of this book. Scan the QR code OR go to packtpub.com/unlock, then use the search bar to find this book by name. Double-check the edition shown to make sure you get the right one.
An additional feature has been added to manage file displays.
有很多方法设计文件管理。我们将在此介绍一个可在项目实施阶段根据需要进行扩展的功能。我们的文件管理代码包含三个功能:
There are many ways to design file management. We will introduce a function here that can be expanded during a project’s implementation phase as needed. Our file management code has three functions:
c_image.png取消选中复选框时删除c_image.png when the checkbox is unchecked我们将编写代码,通过观察 Jupyter Notebook 环境中界面复选框控件的变化来直接处理用户交互。c_image.png当用户取消选中名为 `<image_file_name>` 的复选框时,代码将删除指定的图像文件(`<image_file_name>`)files_checkbox。这确保了文件在不再需要时能够被干净地删除,从而避免文件堆积并节省存储空间。
We will build the code to handle user interactions directly by observing changes in the checkbox widget of our interface within the Jupyter Notebook environment. The code will then delete a specific image file (c_image.png) when the user unchecks the checkbox named files_checkbox. This ensures that files are removed cleanly when they are no longer needed, preventing clutter and saving storage space.
我们首先定义该函数:
We first define the function:
def on_files_checkbox_change(change):
事件处理函数定义了一个名为 `setChanged` 的回调函数on_files_checkbox_change,该函数将在状态files_checkbox改变时执行。`setChanged`change由观察者提供,其中包含有关更改事件的信息,包括以下内容:
The event handler function defines a callback function named on_files_checkbox_change that will execute when the state of files_checkbox changes. change is provided by the observer, which contains information about the change event, including the following:
old复选框的先前状态old: The previous state of the checkboxnew复选框的新状态new: The new state of the checkbox # Only remove images if the checkbox changed from True to False.
if change['old'] == True and change['new'] == False:
这段代码会验证复选框之前是否被选中(True),以及现在是否已被取消选中(False)。这确保文件删除操作仅在用户明确取消选中复选框时才会发生,从而防止意外删除文件。现在我们删除文件:
The code verifies whether the checkbox was previously checked (True) and has now been unchecked (False). This guarantees that the file deletion only occurs when the user explicitly unchecks the checkbox, preventing accidental file removal. We now remove the file:
if os.path.exists("c_image.png"):
os.remove("c_image.png")
我们也需要添加一个观察器,以便on_files_checkbox_change在文件状态发生变化时通知函数:
We also need to add an observer to inform the on_files_checkbox_change function when there is a file status change:
# Attach the observer to files_checkbox
files_checkbox.observe(on_files_checkbox_change, names='value')
该files_checkbox.observe()函数将on_files_checkbox_change 函数链接到files_checkbox小部件。names='value'指定当复选框的值发生变化时(即选中或取消选中时),应触发该函数。
The files_checkbox.observe() function links the on_files_checkbox_change function to the files_checkbox widget. names='value' specifies that the function should be triggered when the value of the checkbox changes (i.e., when it is checked or unchecked).
接下来,我们将进入流程的下一部分,实现处理程序选择机制。
We will now move on to the next part of the pipeline and implement the handler selection mechanism.
处理程序选择机制 它根据预定义的条件动态选择并执行合适的处理程序。它会遍历所有可用的处理程序,评估条件,直到找到匹配项,从而确保对用户输入进行高效且结构化的处理。处理程序选择机制位于我们在前几章构建的函数中。但是,它现在包含一个编排任务,如图7.7chat_with_gpt所示:
The handler selection mechanism dynamically selects and executes the appropriate handler based on predefined conditions. It iterates through available handlers, evaluating conditions until it finds a match, ensuring efficient and structured processing of the user input. The handler selection mechanism is in the chat_with_gpt function we built in the previous chapters. However, it now contains an orchestration task, as shown in Figure 7.7:
chat_with_gpt它仍然是 GenAISys 中的关键功能,现在包含了处理机制。chat_with_gpt remains a pivotal function within the GenAISys and now contains the handler mechanism图 7.7:处理机制的协调作用
Figure 7.7: The orchestration role of the handler mechanism
在更广泛的 GenAISys 工作流程中,处理程序机制充当协调器的角色。它处理用户输入并确定要激活哪些 AI 功能。当 IPython 接口捕获到用户消息时,处理程序机制会评估这些输入,以从处理程序注册表中确定合适的处理程序。如果没有匹配的特定处理程序,则默认使用基于内存的响应,然后将其返回给 IPython 接口。
In the broader GenAISys workflow, the handler mechanism acts as an orchestrator. It processes user inputs and identifies which AI functions to activate. When the IPython interface captures user messages, the handler mechanism evaluates these inputs to determine the appropriate handler from the handler registry. If no specific handler matches, it defaults to a memory-based response, which is then returned to the IPython interface.
该chat_with_gpt函数封装了此逻辑。它遍历预定义的处理程序列表,每个处理程序都与一个对应的条件函数配对。当某个条件评估为真时,将执行关联的处理程序。如果没有匹配的处理程序,则使用基于内存的备用处理程序来确保无缝响应:
The chat_with_gpt function encapsulates this logic. It iterates through a predefined list of handlers, each paired with a corresponding condition function. When a condition evaluates to true, the associated handler is executed. If none match, the fallback memory-based handler ensures a seamless response:
def chat_with_gpt(
messages, user_message, files_status, active_instruct, models
):
global memory_enabled # Ensure memory is used if set globally
Let’s go through the parameters of the function:
messages用户与人工智能之间的对话历史记录messages: The conversation history between the user and the AIuser_message用户的最新消息user_message: The latest message from the userfiles_status跟踪对话中涉及的任何文件的状态files_status: Tracks the status of any files involved in the conversationactive_instruct任何可能影响响应生成的指令或模式active_instruct: Any instruction or mode that might influence how responses are generatedmodels:指定正在使用的活跃 AI 模型models: Specifies the active AI model in use该函数用于global memory_enabled访问一个全局变量,该变量决定是否应使用内存来存储/记住用户的完整对话。在本章中,global memory_enabled=True……
The function uses global memory_enabled to access a global variable that determines whether memory should be applied to store/remember the full dialogue of a user. In this chapter, global memory_enabled=True.
该函数会尝试根据提供的条件执行相应的处理程序:
The function attempts to execute the appropriate handler based on the provided conditions:
try:
# Iterate over handlers and execute the first matching one
for condition, handler in handlers:
if condition(messages, active_instruct, memory_enabled,
models, user_message):
return handler(messages, active_instruct, memory_enabled,
models, user_message, files_status=files_status)
如您所见,for condition, handler in handlers该循环遍历一个名为 的列表handlers,其中每个元素都是一个元组,包含以下元素:
As you can see, for condition, handler in handlers iterates over a list called handlers, where each item is a tuple containing the following items:
if条件,(...)用于 使用提供的参数评估条件函数。if condition, (...), to evaluate the condition function with the provided parameters现在,让我们添加一个备用方案,以防没有处理程序符合输入条件:
Let’s now add a fallback if no handlers match the input conditions:
# If no handler matched, default to memory handling with full conversation history
return handle_with_memory(
messages, # Now passing full message history
user_message,
files_status=files_status,
instruct=active_instruct,
mem=memory_enabled, # Ensuring memory usage
models=models
)
handle_with_memory被调用作为默认处理程序,该处理程序执行以下操作:
handle_with_memory is called as a default handler that does the following:
messages)messages)memory_enabled,而本章的情况正是如此。memory_enabled is true, which is the case in this chapterFinally, let’s add an exception to catch return errors:
except Exception as e:
return f"An error occurred in the handler selection mechanism: {str(e)}"
既然已经定义了处理程序选择机制,我们现在就可以着手构建存储这些处理程序的处理程序注册表了。
With the handler selection mechanism defined, we can now proceed to build the handler registry that stores these handlers.
处理程序注册表是一个结构化的条件-处理程序对集合,其中每个条件都是一个 lambda 函数,用于评估用户消息和指令,以确定是否满足特定条件。当条件满足时,相应的处理程序将被触发并立即执行,如下图所示:
The handler registry is a structured collection of condition-handler pairs, where each condition is a lambda function that evaluates user messages and instructions to determine whether specific criteria are met. When a condition is satisfied, the corresponding handler is triggered and executed immediately, as illustrated:
图 7.8:创建处理程序注册表
Figure 7.8: Creating the handler registry
所有 lambda 函数都有四个参数(a msg、instructb、memc 和d)。这确保了调用处理程序models时参数的数量匹配。chat_with_gpt()
All lambda functions have four parameters (msg, instruct, mem, and models). This ensures that the number of arguments matches when chat_with_gpt() calls a handler.
The handler registry has three main features:
我们将设计我们的处理程序注册表具有以下结构,包含四个关键属性:
We will design our handler registry with the following structure of four key properties:
**kwargs代码中的作用提供了一种与 AI 函数灵活交互的方式。`_` 是关键字参数**kwargs的缩写,用于 Python 函数中,允许向函数传递可变数量的参数。在我们的处理程序注册表代码中,`_`发挥着至关重要的作用,它允许处理程序接受额外的可选参数,而无需在函数中显式定义这些参数。这使得处理程序能够扩展以适应未来的更新或新参数,而无需修改现有函数签名。**kwargs
The role of **kwargs in the code provides a flexible way to interact with the AI functions. **kwargs is short for keyword arguments and is used in Python functions to allow passing a variable number of arguments to a function. In the context of our handler registry code, **kwargs plays a crucial role by allowing handlers to accept additional, optional parameters without explicitly defining them in the function. It makes the handlers extensible for future updates or new parameters without requiring modifications to existing function signatures.
现在我们将开始使用 Pinecone/RAG 处理程序构建处理程序注册表。
We will now begin to build the handler registry with the Pinecone/RAG handler.
这松果/碎布处理员管理检索增强生成(RAG)功能先前已定义。当检测到用户消息中的“Pinecone或”关键字时激活:RAG
The Pinecone/RAG handler manages the retrieval-augmented generation (RAG) functions previously defined. It activates when detecting the Pinecone or RAG keyword within the user message:
# Pinecone / RAG handler: check only the current user message
(
lambda msg, instruct, mem, models, user_message,
**kwargs: “Pinecone” in user_message or “RAG” in user_message,
lambda msg, instruct, mem, models, user_message,
**kwargs: handle_pinecone_rag(user_message, models=models)
),
此处理程序检查用户消息是否包含“Pinecone”或“RAG”,如果包含则lambda:返回 true True;否则返回 false False。现在我们将创建推理处理程序。
This handler checks whether the user message contains “Pinecone” or “RAG,” in which case lambda: returns True; otherwise, it returns False. We will now create the reasoning handler.
我们有推理功能已经构建完成,但现在我们需要一个处理程序。触发处理程序的关键字是 `<keyword>` Use reasoning、customer`<keyword>` 和 `<keyword> activities`。消息中的任何其他文本都为推理过程提供上下文。处理程序使用 `<keyword>`all()来确保所有关键字都包含在消息中:
We have already built the reasoning function, but now we need a handler. The keywords that trigger the handler are Use reasoning, customer, and activities. Any additional text in the message provides context for the reasoning process. The handler uses all() to ensure all keywords are included in the message:
# Reasoning handler: check only the current user message
(
lambda msg, instruct, mem, models, user_message, **kwargs: all(
keyword in user_message for keyword in [
“Use reasoning”, “customer”, “activities”
]
),
lambda msg, instruct, mem, models, user_message, **kwargs:
handle_reasoning_customer(user_message, models=models)
),
接下来,我们来创建分析处理程序。
Let’s move on and create the analysis handler.
这分析处理程序已用于内存分析。现在,它由以下Analysis指令触发:
The analysis handler has been used for memory analysis up to now and is triggered by the Analysis instruction:
# Analysis handler: determined by the instruct flag
(
lambda msg, instruct, mem, models, user_message,
**kwargs: instruct == “Analysis”,
lambda msg, instruct, mem, models, user_message,
**kwargs: handle_analysis(
user_message, models=models)
),
是时候创建生成处理程序了。
Time to create the generation handler.
这生成处理程序将内存分析提升到了一个新的水平。通过让生成式人工智能模型基于对文本的记忆分析,为客户生成引人入胜的文本。该Generation指令会触发生成处理程序:
The generation handler takes memory analysis to another level by asking the generative AI model to generate an engaging text for a customer based on a memory analysis of the text. The Generation instruction triggers the generation handler:
# Generation handler: determined by the instruct flag
(
lambda msg, instruct, mem, models, user_message,
**kwargs: instruct == “Generation”,
lambda msg, instruct, mem, models, user_message,
**kwargs: handle_generation(
user_message, models=models)
),
现在我们来构建图像创建处理程序。
Let’s now build the image creation handler.
这图像创建用户消息中的“ Createand”关键字会触发处理程序:image
The image creation handler is triggered by the Create and image keywords in the user message:
# Create image handler: check only the current user message
(
lambda msg, instruct, mem, models, user_message,
**kwargs: “Create” in user_message and “image” in user_message,
lambda msg, instruct, mem, models, user_message,
**kwargs: handle_image_creation(user_message, models=models)
)
]
现在我们将创建自由式处理器,用于处理没有关键字或指令的情况。
We will now create the freestyle handler for when there is no keyword or instructions.
这当存在以下情况时,处理程序是一个通用处理程序:没有指令或关键字来触发特定函数。让我们相应地添加备用内存处理程序:
This handler is a general-purpose handler when there is no instruction or keyword to trigger a specific function. Let’s append the fallback memory handler accordingly:
# Append the fallback memory handler for when instruct is “None”
handlers.append(
(
lambda msg, instruct, mem, models, user_message,
**kwargs: instruct == “None”,
lambda msg, instruct, mem, models, user_message,
**kwargs: handle_with_memory(
msg,
user_message,
files_status=kwargs.get(‘files_status’),
instruct=instruct,
mem=memory_enabled, # Replace user_memory with memory_enabled
models=models
)
)
)
请注意,我们已将其替换user_memory为,以memory_enabled通用化内存管理。
Note that we have replaced user_memory with memory_enabled to generalize memory management.
您可以添加为许多处理程序和人工智能功能,正如您所见希望将数据添加到处理程序注册表中。您可以根据需要扩展 GenAISys 的规模。您还可以通过将关键字替换为明确的指令来修改它们,就像我们对 ` Analysisand`generation函数所做的那样。处理程序随后将调用您需要的所有 AI 函数。
You can add as many handlers and AI functions as you wish to the handler registry. You can scale your GenAISys as much as you need to. You can also modify the keywords by replacing them with explicit instructions, as we did for the Analysis and generation functions. The handlers will then call all the AI functions you need.
现在我们来了解一下人工智能功能的新组织架构。
Let’s now go through the new organization of the AI functions.
我们现在将运行由处理程序注册表激活的 AI 函数。这些函数基于前几章的函数,但现在由本章介绍的处理程序选择机制进行管理。此外,本节中使用的示例基于与产品设计和生产场景相关的典型提示。请记住,由于生成式 AI 模型的随机(概率)特性,每次运行这些任务时,输出结果都可能有所不同。
We will now run the AI functions that are activated by the handler registry. The functions build on those from earlier chapters but are now managed by the handler-selection mechanism introduced in this chapter. Additionally, the examples used in this section are based on typical prompts related to product design and production scenarios. Keep in mind that, due to the stochastic (probabilistic) nature of generative AI models, outputs can vary each time we run these tasks.
图 7.9:处理程序选择机制和注册表调用的 AI 函数
Figure 7.9: AI functions call by the handler selection mechanism and registry
接下来,我们将执行 GenAISys 中所有现有的 AI 功能,并在适用情况下调用 DeepSeek 模型。让我们开始吧。使用 RAG 函数。
We’ll now execute all AI functions currently available in our GenAISys, incorporating DeepSeek model calls where applicable. Let’s begin with the RAG functions.
语音合成、文件管理、对话历史记录和摘要生成等功能与前几章相比保持不变。
Functions such as speech synthesis, file management, dialogue history, and summary generation remain unchanged from previous chapters.
此 RAG 函数能使用 OpenAI 或 DeepSeek,并Pinecone在用户消息中输入关键字运行。RAG 函数的名称已更改,但其查询处理过程保持不变:
This RAG function can run with OpenAI or DeepSeek with the Pinecone keyword in the user message. The RAG function’s name has changed, but its process remains unchanged for the query:
# Define Handler Functions
def handle_pinecone_rag(user_message, **kwargs):
if "Pinecone" in user_message:
namespace = "genaisys"
if "RAG" in user_message:
namespace = "data01"
print(namespace)
query_text = user_message
query_results = get_query_results(query_text, namespace)
print("Processed query results:")
qtext, target_id = display_results(query_results)
print(qtext)
# Run task
sc_input = qtext + " " + user_message
然而,该函数现在包含一个经过 DeepSeek 精简的 R1 调用。如果没有提供模型或 DeepSeek 被禁用,该函数首先默认使用 OpenAI:
However, the function now contains a DeepSeek distilled R1 call. The function first defaults to OpenAI if no model is provided or if DeepSeek is deactivated:
models = kwargs.get("models", "OpenAI") # Default to OpenAI if not provided
if models == "DeepSeek" and deepseek==False:
models="OpenAI"
if models == "OpenAI":
task_response = reason.make_openai_api_call(
sc_input, "system","You are an assistant who executes the tasks you are asked to do.", "user")
如果启用了 DeepSeek,则会在为此任务选择 IPython 界面时调用它:
If DeepSeek is activated, it will be called if chosen in the IPython interface for this task:
if models == "DeepSeek":
…
return f"{namespace}:{models}: {task_response}"
我们首先进行情感分析。
We will first run a sentiment analysis.
An example user input by the PDPM is the following:
Let's see what Pinecone thinks about this sentence: The customer did not like the design of the traveling bag we gave out because it was not the right color requested.
选择并勾选 OpenAI(默认)后,Agent输出结果如下:
The output with OpenAI selected (default) and Agent checked will be as follows:
Agent: genaisys:OpenAI: Sentiment analysis score: 0.3
Explanation: The sentiment of the text is predominantly negative. The customer expresses dissatisfaction with the traveling bag due to it not being the right color as requested. The use of phrases like "did not like" and "not the right color" indicates a negative sentiment, as the customer's expectations were not met. The score of 0.3 reflects this negative sentiment, though it is not extremely low because the issue is specific to the color and not the overall quality or functionality of the bag.
另一方面,如果选中(默认)并勾选了 DeepSeek,则Agent输出结果如下:
On the other hand, the output with DeepSeek selected (default) and Agent checked will be as follows:
Agent: genaisys:DeepSeek:
…The numerical value should be between 0 and 1, so let's say 0.2….
Wait, but the user didn't specify a particular method or model for sentiment analysis. Maybe they want a simple rule-based approach? Let's consider the presence of negative words like "did not like" and "not the right color requested." These words indicate dissatisfaction. However, the sentiment isn't extremely negative since it's just one aspect (color) that's the issue, not the entire product or experience. So, a score of 0.2 seems appropriate because it reflects a moderate negative sentiment based on the specific criticism given.
两种情况下,情感评分和解释评分均可接受。试想一下,如果收到成千上万条这样的客户反馈信息——GenAISys 会自动过滤掉低分信息,并将这些结果存储在客户数据库中。
The sentiment score and explanation score are acceptable in both cases. Imagine receiving thousands of such customer feedback messages—the GenAISys filters the low scores and provides these outputs automatically, storing them in the customer database.
现在,PDPM 会检查语义分析。
Now, the PDPM checks semantic analysis.
Consider another example input by the PDPM:
Let's see what meaning analysis Pinecone on this comment: The customer did not accept the traveling bag we gave out for free because the size and colors did not match the order processed.
这个 RAG 函数可以配合 OpenAI 或 DeepSeek 使用,用户消息中包含“Pinecone”关键字即可运行。
This RAG function can run with OpenAI or DeepSeek with a “Pinecone” keyword in the user message.
OpenAI 的输出结果可以接受,并且清晰地概括了消息中的语义关系:
OpenAI’s output is acceptable and clearly outlines the semantic relationships within the message:
Agent: genaisys:OpenAI: Semantic analysis involves understanding the meaning and interpretation of words, phrases, and sentences in context. It is a crucial aspect of natural language processing (NLP) that helps in extracting meaningful information from text. In the context of your comment, semantic analysis would involve breaking down the sentence to understand the underlying meaning and the relationships between different components of the sentence.
Here's a breakdown of the semantic elements in the comment:
1. Subject: "The customer" - This refers to the individual who is the focus of the action.
2. Action: "did not accept" - This indicates the customer's refusal or rejection of something.
3. Object: "the traveling bag we gave out for free" - This is the item that was offered to the customer without charge.
4. Reason: "because the size and colors did not match the order processed" - This provides the explanation for the customer's refusal, indicating a mismatch between what was expected (the order) and what was received (the traveling bag).
…
DeepSeek 的输出结果也具有参考价值:
DeepSeek’s output is relevant as well:
Agent: genaisys:DeepSeek: 300,Semantic analysis…This is not an analysis but a semantic search. Provide more information on the topic. …
The original problem was that the customer didn't accept the travel bag due to mismatched size and color from the ordered one. So, my goal here should be to understand why this happened and perhaps suggest solutions.
I think the first step would be to look into whether there was a miscommunication in the ordering process. Maybe the website had incorrect images or descriptions leading to wrong expectations. Or perhaps during manufacturing, sizes got altered without notice. Another angle could be customer feedback—have others reported similar issues? It might also help to check if there were any system errors in processing orders which caused discrepancies between what was ordered and what was delivered.
…
# Final Answer
After conducting thorough research and identifying potential contributing factors, effective solutions have been implemented to address the discrepancy issues, resulting in improved satisfaction among customers receiving accurate products aligned with their expectations.
DeepSeek的答案比较长也比较复杂。但是,团队会更喜欢哪种答案呢?当然是简短的答案。是像OpenAI那样给出简短的回应,还是需要更详细的解释?可以通过研讨会和会议来决定。
DeepSeek’s answer is longer and more complex. However, what would a team prefer? A shorter answer like OpenAI’s response or a longer one with more explanations? The decision can be reached through workshops and meetings.
OpenAIDeepSeek 可用于数据检索。用户对产品设计师的输入可能如下:
Both OpenAI and DeepSeek can be used for data retrieval. The user input for a product designer could be as follows:
I'm a product designer and am trying to analyze human memory types. Use RAG to tell us what the CTO said about different memory types
输出结果令人满意:
The output is satisfactory:
Agent: data01:OpenAI: To analyze the CTO's statement about different memory types using Retrieval-Augmented Generation (RAG), we can break down the information into the relevant memory types and their application in the hybrid travel marketing campaign:
1. Memoryless: This refers to systems or processes that do not retain any information from past interactions. In the context of your campaign, this might relate to interactions that do not require historical data, such as one-time queries or transactions…
因此,我们的GenAISys中拥有一个灵活的RAG系统,可以运行我们所需的模型。但是,我们仍然需要针对每组要执行的任务对模型进行评估。
We thus have a flexible RAG system in our GenAISys that can run with the models we wish. However, we still have to evaluate the models for each set of tasks we want to perform.
CoT 函数启用“文件”选项后,默认使用OpenAI作为模型提供程序。其实现方式与上一章中的构建和运行方式保持一致。主要区别在于,它现在已集成到处理器选择机制中,该机制会根据输入中的特定关键字激活:
The CoT function operates with Files checked and defaults to OpenAI as the model provider. Its implementation remains consistent as it is built and run in the previous chapter. The key difference is that it is now integrated into the handler selection mechanism, which activates based on specific keywords in the input:
def handle_reasoning_customer(user_message, **kwargs):
initial_query = user_message
download("Chapter05", "customer_activities.csv")
reasoning_steps = reason.chain_of_thought_reasoning(initial_query)
return reasoning_steps
考虑 来自 PDPM 的一个用户输入示例:
Consider an example user input from the PDPM:
Use reasoning to analyze customer activities so I can design custom travel merchandise kits.
Agent: data01:OpenAI: To analyze the CTO's statement about different memory types using Retrieval-Augmented Generation (RAG), we can break down the memory types mentioned and their relevance to the hybrid travel marketing campaign:
1. Memoryless: This refers to systems or processes that do not retain any information from past interactions. In the context of a marketing campaign, this would mean treating each customer interaction as a standalone event without any historical context….
现在让我们看看使用这两个模型来源(OpenAI 和 DeepSeek)进行内存分析的效果如何。
Let’s now see how memory analysis will run with both model sources (OpenAI and DeepSeek).
两个都OpenAI 和 DeepSeek 模型使用神经科学式的分类方法处理基于记忆的客户画像。该函数已针对处理器选择过程进行了调整,并包含一个 DeepSeek 调用:
Both OpenAI and DeepSeek models handle memory-based customer profiles using neuroscientific-style categorizations. The function has been adapted to the handler selection process and contains a DeepSeek call:
def handle_analysis(user_message, **kwargs):
from cot_messages_c6 import system_message_s1
models = kwargs.get("models", "OpenAI") # Default to OpenAI if not provided
if models == "DeepSeek" and deepseek==False:
models="OpenAI"
if models == "OpenAI":
reasoning_steps = reason.make_openai_reasoning_call(
user_message, system_message_s1)
if models == "DeepSeek":
…
return reasoning_steps
Analysis用户使用列表中的选项输入的示例Reasoning可能是:
An example user input using the Analysis option in the Reasoning list could be:
The hotel was great because the view reminded me of when I came here to Denver with my parents. I sure would have like to have received a custom T-shirt as a souvenir. Anyway, it was warm and sunny so we had an amazing time.
OpenAI 的输出结果包含一个有用的部分,突出了顾客希望获得个性化纪念品的情感维度,这可以帮助产品设计师设计商品。套件生产工作:
OpenAI’s output contains a useful segment highlighting the emotional dimension related to the customer’s wish for a personalized souvenir, which could help the product designer with their merchandise kit production endeavor:
…
Segment 2: "I sure would have like to have received a custom T-shirt as a souvenir."
- Memory Tags: [Episodic Memory]
- Dimension: Emotional
- Sentiment Score: 0.4
- Explanation: Here the speaker expresses a personal wish or regret about a missing souvenir from the event. Although it doesn't recount an actual episode in detail, it still connects to the personal event and reflects a feeling of slight disappointment, thereby engaging episodic memory and an emotional dimension with a modestly negative sentiment.
然而,DeepSeek 的输出却偏离了正轨。它首先找到了要执行的正确任务:
DeepSeek’s output, however, goes off track. It first finds the right task to do:
Okay let's see this through step by step now...
Alright, I need to tackle analyzing segments from the given user response according to their detailed tagging system based on cognitive psychology principles regarding STM vs LTM, semantic versus episodic memory, reality vs fiction, among others plus dimensions such as Intellectual, Emotional, Physical, along with assigning sentiments scores ranging from 0-1 reflecting positivity.
Let me start reading carefully paragraph-wise.
First sentence:"The hotel wasgreatbecauseviewremindedmeofwhencyamehere todallas."
但随后文本格式混乱,似乎难以连贯,出现了不规则的间距,甚至出现了外文字符:
But it then gets lost and seems to struggle with formatting and coherence, introducing irregular spacing and even foreign characters:
…Butwait,theuser later talks about souvenirs wantingcustomTshirtswhichmaybe indicatespositiveintent。…
但此处更多的是体验性的(Eating和Enjoying)=所以可能既有知识元素也有身体维度的食物味道。但主要在这里是描述经历,因此属于Episode或语义吗?…
DeepSeek 当然可以做得更好,但要改进结果,需要对提示进行更多迭代优化,或者选择更强大的 DeepSeek 版本或 API。投入时间优化提示存在一定风险,因为即使优化后的结果也未必符合预期。最终,是否优化提示、切换到 DeepSeek API、探索其他 DeepSeek 版本,还是默认使用 OpenAI,应该由团队内部协作决定,并基于以下因素进行评估:根据您项目的需求。
DeepSeek can certainly do better, but improving this result would require additional iterations of prompt refinement or selecting a more robust DeepSeek variant or API. Investing time in refining prompts carries some risk, as even then, the outcome may not meet your expectations. Whether to refine the prompt, switch to a DeepSeek API, explore another DeepSeek variant, or default to OpenAI should ultimately be decided collaboratively within the team and based on your project’s needs.
现在我们来运行生成函数。
Let’s now move on to running the generation function.
这生成功能(Generation在Reasoning列表中选择),默认选中OpenAI、Agent和Files时处于激活状态,支持创建引人入胜、基于记忆的客户消息:
The generation function (select Generation in the Reasoning list), active by default with OpenAI, Agent, and Files checked, supports the creation of engaging, memory-based customer messages:
def handle_generation(user_message, **kwargs):
from cot_messages_c6 import (
system_message_s1, generation, imcontent4, imcontent4b
)
reasoning_steps = reason.memory_reasoning_thread(
user_message, system_message_s1, generation,
imcontent4, imcontent4b
)
return reasoning_steps
我们以一般的用户输入为例:
Let’s consider a general user input as an example:
The hotel was great because the view reminded me of when I came here to Denver with my parents. I sure would have like to have received a custom T-shirt as a souvenir. Anyway, it was warm and sunny so we had an amazing time.
OpenAI 的输出结果是一条吸引顾客的信息,融合了怀旧元素和商品销售建议,并配有一张合适的定制 T 恤图片:
OpenAI’s output is an appealing customer-facing message, blending nostalgia and merchandising suggestions, accompanied by an appropriate custom T-shirt image:
Customer message: Dear Customer,
Experience the charm of Denver with a nostalgic hotel view and enjoy the sunny weather. Explore the beautiful Denver Botanic Gardens and the iconic Red Rocks Amphitheatre. Don't miss out on exclusive souvenirs from local artists and a personalized T-shirt to remember your trip.
Best regards,
图 7.10:客户的个人形象
Figure 7.10: A personal image for a customer
这该功能利用 DALL-E 生成图像,并勾选“文件”复选框。除了适配处理程序选择机制外,该功能本身没有其他变化,用户输入中的` Createand`关键字用于激活此功能:image
This functionality utilizes DALL-E to generate images, with the Files box checked. The function does not change beyond being adapted to the handler-selection mechanism, which activates this feature with the Create and image keywords in the user input:
def handle_image_creation(user_message, **kwargs):
prompt = user_message
image_url = reason.generate_image(
prompt, model="dall-e-3", size="1024x1024",
quality="standard", n=1
)
# Save the image locally
save_path = "c_image.png"
image_data = requests.get(image_url).content
with open(save_path, "wb") as file:
file.write(image_data)
return "Image created"
产品设计师可以利用它来构思商品陈列套装:
The product designer could use it to ideate merchandising kits:
Create an image: Create an image of a custom T-shirt with surfing in Hawaii on big waves on it to look cool.
最终成果是一款很酷的T恤,制作团队可以借鉴并加以改造用于生产:
The output is a cool T-shirt that the production team could use and adapt for production:
图 7.11:定制 T 恤设计
Figure 7.11: Custom T-shirt design
We will now create freestyle prompts that are not triggered by any keywords or instructions.
这当没有特定指令或关键字与输入匹配时,通用处理程序将被激活。handle_with_memory它使用 OpenAI 和 DeepSeek 运行,具体取决于所选模型。用户对话的内存由一个全局变量设置,memory_enabled该变量在程序启动时初始化:
This general-purpose handler activates when no specific instruction or keyword matches the input. handle_with_memory runs with OpenAI and DeepSeek, depending on the model selected. The memory of a user dialogue is set with a global variable, memory_enabled, that is initialized at the beginning:
# Global variable to ensure memory is always used
memory_enabled = True # Set to True to retain conversation memory
def handle_with_memory(messages, user_message, **kwargs):
global memory_enabled # Ensure global memory setting is used
memory_enabled如果设置为: ,则该函数将返回一条消息并停止False。
The function will return a message and stop if memory_enabled is set to False:
# If memory is disabled, respond with a message
if not memory_enabled:
return "Memory is disabled."
它将处理用户对话历史记录中的历史消息:
It will process the past messages of a user from the conversation history:
# Extract all past messages (user + assistant) from the conversation history
conversation_history = [
f"{msg['role'].capitalize()}: {msg['content']}"
for msg in messages if "content" in msg
]
# Combine all conversation history
combined_history = "\n".join(conversation_history)
# Append the latest user message to the history
full_context = f"{combined_history}\nUser: {user_message}"
然后,如果在 IPython 界面中没有选择其他模型,则默认使用 OpenAI 作为模型提供程序来选择模型:
Then, the models are selected with OpenAI being the default model provider if no other model is selected in the IPython interface:
models = kwargs.get("models", "OpenAI") # Default to OpenAI if not provided
if models == "OpenAI":
task_response = reason.make_openai_api_call(
full_context, "system",
"You are an assistant who executes the tasks you are asked to do.", "user"
)
elif models == "DeepSeek":
# Tokenize the full conversation history for DeepSeek
…
# Decode the output
task_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
响应消息将被存储并返回:
The response message is stored and returned:
# Store bot response in memory
messages.append({"role": "assistant", "content": task_response})
return task_response
PDPM 的输入示例可能如下:
An example input by the PDPM could be the following:
We are new to production processes and experience issues with the production of a customized travel bag by Supplier S100 for a trip to Hawaii that should have an image of Hawaii on it with the name of the customer. The production process contained operations 10,20,30,40,50 and 60. Operation 40 is the operation that prints the image and the name of the customer on the bag. Supplier S100 lacks the special ink to perform the operation. SupplierS200 has the ink but the production of operations 10 to 30 have already been performed. Instructions: think this problem through and provide real-world production ideas to solve this issue.
OpenAI的答案这样做既可以接受,也很有成效。请花些时间阅读提示和回答,它们展示了生成式人工智能从自然语言处理通用任务向零样本特定领域任务的转变:
OpenAI’s answer is both acceptable and productive. Take your time to read the prompt and the response, which shows the transition of generative AI from NLP general tasks to zero-shot domain-specific tasks:
Agent: To address the issue of Supplier S100 lacking the special ink needed for operation 40, while Supplier S200 has the ink, consider the following real-world production solutions:
1. Outsource Operation 40 to Supplier S200:
- Coordinate with Supplier S200 to perform operation 40. This involves transporting the semi-finished bags (after operations 10 to 30) to Supplier S200 for printing.
- Ensure that Supplier S200 is capable of handling the volume and has the necessary equipment to perform the operation to your quality standards.
2. Ink Transfer:
- If feasible, arrange for Supplier S200 to supply the special ink to Supplier S100. This would allow Supplier S100 to continue with operation 40 without needing to transport the bags.
- Ensure that Supplier S100 has the necessary equipment and expertise to use the ink provided by Supplier S200.
3. Collaborative Production:
- Establish a partnership between Supplier S100 and Supplier S200 where both suppliers work together to complete the production process. This could involve sharing resources, expertise, and logistics.
4. Logistics and Scheduling:
- Plan the logistics for transporting the bags between suppliers efficiently to minimize delays.
- Adjust the production schedule to accommodate the additional time required for transportation and coordination between suppliers.
5. Quality Control:
- Implement a quality control process to ensure that the printing meets the required standards, regardless of which supplier performs the operation.
- Conduct inspections before and after operation 40 to maintain quality consistency.
6. Communication and Coordination:
- Maintain clear communication with both suppliers to ensure that all parties are aligned on timelines, quality expectations, and responsibilities.
- Use project management tools to track progress and address any issues promptly.
7. Contingency Planning:
- Develop a contingency plan in case of further supply chain disruptions, such as identifying alternative suppliers or inks that could be used in the future.
DeepSeek 的回答虽然包含一些有趣的观点,但质量并不相同:
DeepSeek’s answer does not provide the same quality, although it contains some interesting points:
C.), Utilize Alternate Suppliers With Intermediaries Who Could Possibly Acquire Required Inks From Other Sources Perhaps Via Importing Directly Under Different Terms Agreements,
D ). Leverage Existing Inventory Wherever Feasible Reallocating Stockpiles Smartly Across Production Lines Maybe Even Repurposing Materials When Not Used For Primary Function Into Secondary Applications Which May Free Up Costs And Resources While Mitigating Penalties Imposed By Delays,
获得要获得更好的结果,需要进一步的快速设计和输出分析周期,评估未经提炼的 DeepSeek 模型,例如 DeepSeek-V3 或 DeepSeek-R1。DeepSeek 肯定可以做得更好,正如在https://chat.deepseek.com/上使用 DeepSeek-R1 所证明的那样,它产生了以下输出:
Obtaining a better result would require further prompt design and output analysis cycles, evaluating DeepSeek models that are not distilled, such as DeepSeek-V3 or DeepSeek-R1. DeepSeek can surely do better, as demonstrated by using DeepSeek-R1 on https://chat.deepseek.com/, which produced the following output:
To resolve the production issue with the customized travel bags, consider the following structured solutions:
### 1. **Outsource Operation 40 to Supplier S200**
- **Process**:
- S100 ships semi-finished bags (post-operation 30) to S200.
- S200 performs operation 40 (printing) using their special ink.
- Bags are returned to S100 to complete operations 50 and 60.
- **Key Considerations**:
- Use expedited shipping (e.g., air freight) to minimize delays.
- Ensure packaging protects semi-finished goods during transit.
- Negotiate cost-sharing for logistics between S100 and S200.
### 2. **Expedite Ink Procurement for S100**
- **Action**:
- Source the ink directly from S200 or an alternative supplier.
- Prioritize urgent delivery (e.g., same-day courier).
- **Feasibility Check**:
- Verify if the ink is proprietary or available on the open market.
- Address contractual barriers if S200 is a competitor.
### 3. **Supplier Collaboration for End-to-End Production**
- **Option**:
- S200 takes over all remaining steps (40, 50, 60) after S100 completes operations 10–30.
- **Requirements**:
- Confirm S200's capability to perform operations 50 and 60.
- Align quality standards between suppliers to avoid inconsistencies.
### 4. **Process Re-engineering**
- **Temporary Workaround**:
- Explore alternative printing methods (e.g., heat transfer, patches) if customer specs allow.
- Assess feasibility of resequencing operations (e.g., printing later), though this is unlikely.
### 5. **Proactive Communication and Risk Mitigation**
- **Customer Transparency**:
- Inform the customer of potential delays and offer solutions (e.g., partial deliveries, discounts).
- **Prevent Future Issues**:
- Implement dual sourcing for critical materials (e.g., ink).
- Use inventory management tools to track material availability in real-time.
### **Cost vs. Speed Tradeoff Analysis**
- **Outsourcing to S200**: Higher logistics costs but faster resolution if suppliers are geographically close.
- **Expedited Ink**: Higher material costs but avoids production handoffs.
### **Recommended Approach**
1. **Prioritize Outsourcing Operation 40**:
- This balances speed and feasibility if S200 can print without disrupting the workflow.
2. **Backup Plan**:
- If logistics are too slow, expedite ink procurement while negotiating with S200 for a bulk discount.
By combining supplier collaboration, agile logistics, and proactive planning, the production delay can be minimized while maintaining quality and customer trust.
阅读请仔细阅读本节中的提示和输出。出于安全和隐私考虑,我们仅使用本地安装的精简版 Hugging Face 开源 DeepSeek-R1。但是,如果您拥有必要的权限,也可以像使用 ChatGPT 或任何其他在线平台一样,使用 DeepSeek 的在线版本来完成某些任务,例如本节中的生产示例。根据项目的具体要求,您还可以探索美国 DeepSeek API 或其他部署方案。
Read the prompts and outputs in this section carefully. For security and privacy reasons, we are using only a locally installed, distilled Hugging Face open source version of DeepSeek-R1. However, you could use the online version of DeepSeek for certain tasks, such as the production example in this section, if you have the necessary permissions, just as you would with ChatGPT or any other online platform. Depending on your project’s specifications, you could also explore US-based DeepSeek APIs or alternative deployment approaches.
总之,只要引导得当,并选择最合适的模型版本,OpenAI 和 DeepSeek 都能够提供有效的生产解决方案。生成式人工智能显然已经进入了一个新阶段!
In any case, both OpenAI and DeepSeek are capable of delivering effective production solutions when correctly prompted and when the most appropriate model version is selected. Generative AI has clearly entered a new phase!
在本章中,我们进一步探索了生成式人工智能系统。首先,我们花时间了解了DeepSeek-R1的出现,这是一个功能强大的开源推理模型,以其在训练效率方面的创新改进而闻名。这一进展立即给项目经理们提出了一个关键问题:我们应该不断追随实时趋势,还是应该优先维护系统的稳定性?
In this chapter, we moved further along our journey into generative AI systems. First, we took the time to digest the arrival of DeepSeek-R1, a powerful open source reasoning model known for innovative efficiency improvements in training. This development immediately raised a critical question for project managers: should we constantly follow real-time trends or prioritize maintaining a stable system?
为了应对这一挑战,我们开发了一种平衡的解决方案,即构建一个处理器选择机制。该机制处理用户消息,触发处理器注册表中的处理器,然后激活相应的AI功能。为了确保灵活性和适应性,我们更新了IPython接口,使用户能够在启动任务之前轻松地在OpenAI和DeepSeek模型之间进行选择。
To address this challenge, we developed a balanced solution by building a handler selection mechanism. This mechanism processes user messages, triggers handlers within a handler registry, and then activates the appropriate AI functions. To ensure flexibility and adaptability, we updated our IPython interface, allowing users to easily select between OpenAI and DeepSeek models before initiating a task.
这种设计允许 GenAISys 管理员引入新的实验模型或任何其他功能(非 AI、ML 或 DL),同时还能访问已验证的结果。例如,在分析用户评论时,管理员可以使用可靠的 OpenAI 模型运行任务,同时评估 DeepSeek 模型。管理员还可以根据需要禁用特定模型,从而在稳定性和创新性之间取得切实可行的平衡,这在当今快节奏的 AI 环境中至关重要。
This design allows the GenAISys administrator to introduce new experimental models or any other function(non-AI, ML, or DL) while maintaining access to proven results. For instance, when analyzing user comments, administrators can run tasks using the reliable OpenAI model while simultaneously evaluating the DeepSeek model. Administrators can also disable specific models when necessary, providing a practical balance between stability and innovation, which is crucial in today’s fast-paced AI environment.
为了切实实现这种平衡,我们首先在一台独立的笔记本电脑上安装并运行了 DeepSeek-R1-Distill-Llama-8B,并通过与生产环境相关的示例演示了其功能。然后,我们将这个精简后的模型集成到我们的 GenAISys 系统中,这就需要更高的灵活性和可扩展性。
To achieve this balance practically, we began by installing and running DeepSeek-R1-Distill-Llama-8B in an independent notebook, demonstrating its capabilities through production-related examples. We then integrated this distilled model into our GenAISys, creating a need for enhanced flexibility and scalability.
引入处理程序选择机制和结构化处理程序注册表,确保我们的系统能够高效且无限扩展。每个处理程序都采用统一的模块化格式,方便管理员进行管理、激活或停用。我们通过一系列与产品设计和生产相关的实际案例演示了这些处理程序。
The introduction of the handler selection mechanism and the structured handler registry ensures that our system can scale effectively and indefinitely. Each handler follows a unified, modular format, enabling easy management, activation, or deactivation by administrators. We demonstrated these handlers through a series of practical prompts related to product design and production.
我们现在已做好准备,扩展和升级我们的 GenAISys 系统,并在这个灵活的框架内添加新功能。在下一阶段,我们将继续推进这一进程,将 GenAISys 系统与更广阔的外部世界连接起来。
We are now positioned to expand and scale our GenAISys, adding new features within this adaptable framework. In the next chapter, we’ll continue this journey by connecting our GenAISys to the broader external world.
立即解锁本书的专属权益Unlock this book’s exclusive benefits now扫描此二维码或访问packtpub.com/unlock,然后按书名搜索本书。 Scan this QR code or go to packtpub.com/unlock, then search for this book by name. |
|
|
注意:开始之前请准备好您的采购发票。 Note: Keep your purchase invoice ready before you start. |
随着人工智能的作用不断扩大,轨迹分析已渗透到人类活动的方方面面,从披萨外卖到基因组测序,无所不包。本章将介绍城市尺度的出行预测,重点阐述缺失或噪声坐标如何影响配送、灾害管理、城市规划和疫情预测等实际应用。我们的出行系统架构借鉴了 Tang 等人 (2024) 的创新性工作。
As AI’s role continues to expand, trajectory analysis has permeated all human activity, from pizza deliveries to genome sequencing. This chapter introduces city-scale mobility prediction, highlighting how missing or noisy coordinates can undermine real-world applications in deliveries, disaster management, urban planning, and epidemic forecasting. The architecture of our mobility system draws inspiration from the innovative work of Tang et al. (2024).
我们将首先使用该笔记本构建并集成一个先进的轨迹仿真和预测流程到我们的 GenAISys 系统中1_Trajectory_simulation_and_prediction.ipynb。主要目标是利用合成数据来解决模拟人类短期和长期移动性的难题。我们首先介绍生成和大型语言模型(LLM)。然后,我们演示如何使用基于 Python 的解决方案来扩展这一理念,其中包括一个自定义的合成网格生成器,该生成器可以模拟二维城市地图上的随机轨迹,并故意插入缺失数据以进行测试。这些随机轨迹可以代表配送或其他序列,例如在线旅行社的旅行套餐(定制旅行包或小册子)。
We will first build and integrate an advanced trajectory simulation and prediction pipeline into our GenAISys using the 1_Trajectory_simulation_and_prediction.ipynb notebook. The main objective is to address the challenge of modeling human mobility, both short- and long-term, by leveraging synthetic data generation and large language models (LLMs). We then demonstrate how to build upon this idea using Python-based solutions, complete with a custom synthetic grid generator that simulates random trajectories through a two-dimensional city map, deliberately inserting missing data for testing. These random trajectories could represent deliveries or other sequences, such as travel packages (custom bags or booklets) for an online travel agency.
接下来,我们将构建一个多步骤协调器函数,该函数会将用户指令、合成数据集和特定领域的消息合并,然后将其传递给由 LLM 驱动的推理线程。该模型将检测并预测由占位符值(例如“ 999, 999”)标记的未知位置,并通过上下文插值填充这些空白。这种方法展示了基于文本的预测的可解释性,同时保持了系统的思路,包括调试步骤,例如在生成最终 JSON 输出之前记录缺失点。
Next, we will build a multistep orchestrator function that merges user instructions, the synthetic dataset, and domain-specific messages before passing them to an LLM-driven reasoning thread. The model will detect and predict unknown positions marked by placeholder values (such as 999, 999), filling these gaps through contextual interpolation. This approach demonstrates the interpretability of text-based predictions while maintaining a systematic chain of thought, including debugging steps such as logging missing points before producing the final JSON output.
为了支持强大的用户交互,我们将轨迹流程集成到我们构建的 GenAISys 多操作器环境中,允许用户通过“移动”指令请求来触发轨迹的创建和分析。我们将实现一个轨迹仿真和预测界面。可视化组件已集成到系统中,能够自动生成并以静态图像的形式显示最终路径(包括方向箭头、缺失数据标记和坐标修正)。数据生成、LLM 推理和用户界面之间的协同作用,充分展现了我们方法的端到端可行性,使用户能够根据需要将轨迹仿真和预测应用于不同的领域。
To support robust user interaction, we will integrate the trajectory pipeline into the GenAISys multihandler environment we’ve built, allowing requests for “mobility” instructions to trigger the creation and analysis of trajectories. We will implement a trajectory simulation and prediction interface. Visualization components are incorporated, automatically producing and displaying the resulting path (including direction arrows, missing data markers, and coordinate fixes) as a static image. The synergy between data generation, LLM inference, and the user interface showcases the end-to-end viability of our method, empowering users to apply trajectory simulation and prediction across different domains as needed.
本章提供了一个蓝图,用于将合成轨迹数据集与GenAISys中的提示驱动型LLM方法相结合。通过遵循Tang等人描述的设计模式,我们将探索纯文本模型如何在结构上进行最小修改的情况下,实现卓越的时空推理能力。将移动性模拟与用户友好界面相结合,可以为各种移动性分析场景提供高度可解释且精细的预测。
This chapter provides a blueprint for coupling synthetic trajectory datasets with a prompt-driven LLM approach in the GenAISys. By following the design patterns described by Tang et al., we will explore how purely text-oriented models can excel at spatial-temporal reasoning with minimal structural modifications. Bridging mobility simulation and user-friendly interfaces can provide highly interpretable, fine-grained predictions for a variety of mobility analytics scenarios.
本章涵盖以下主题:
This chapter covers the following topics:
我们首先来定义轨迹模拟和预测框架的范围。
Let’s begin by defining the scope of the trajectory simulation and prediction framework.
本节内容受 Tang 等人 (2024) 的论文《指令调整 Llama-3-8B 在城市规模出行预测中表现出色》的启发。我们将探讨……本文介绍了人类移动性预测所面临的挑战的基本背景、论文的主要贡献,以及如何将这些想法转化为实际的 Python 实现。
This section is inspired by Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction by Tang et al. (2024). We will explore the essential background on the challenges of human mobility prediction, the paper’s key contributions, and how these ideas can be translated into practical Python implementations.
人口流动预测侧重于预测个人(或群体)的出行地点和时间,它在不断扩展的领域中发挥着至关重要的作用,包括以下领域:
Human mobility prediction focuses on forecasting where and when individuals (or groups) will travel, and it plays a critical role in an expanding set of domains, including the following:
在本例中,我们将首先应用移动性预测来为在线旅行社的客户交付定制产品(例如,包、T恤和小册子)。
In our case, we will first apply mobility prediction to the delivery of customized products (e.g., bags, T-shirts, and booklets) for an online travel agency’s customers.
传统上,这些预测依赖于专门的机器学习模型,例如循环神经网络(RNN)。注意力机制或图神经网络(GNN)。虽然这些这些技术可能有效,但它们通常需要劳动密集型的特征工程,并且不容易推广到不同的地点或时间范围(例如,短期预测与长期预测)。
Traditionally, these predictions relied on specialized machine learning models, such as recurrent neural networks (RNNs) with attention mechanisms or graph neural networks (GNNs). While these techniques can be effective, they often require labor-intensive feature engineering and are not easily generalizable across diverse locations or time horizons (e.g., short-term versus long-term predictions).
现在让我们来探讨促使人们利用法学硕士来解决这些问题的关键挑战。
Let’s now examine the key challenges motivating the use of LLMs to address these issues.
前沿的法学硕士课程针对长期困扰传统出行分析和预测系统的若干挑战,提出了有前景的解决方案:
Cutting-edge LLMs offer promising solutions to several challenges that have historically plagued traditional mobility analysis and prediction systems:
虽然LLM模型并非完美无缺,但它通过最大限度地减少人工特征工程,解决了这些关键难题,为传统模型提供了一种有效的替代方案。让我们来看看它是如何实现的。
While LLMs are not perfect, they provide an effective alternative to traditional models by addressing these key obstacles with minimal manual feature engineering. Let’s see how.
从传统方法到法学硕士(LLM)的转变历程可以通过一些突破性的变革来追溯。传统方法消耗以往需要投入大量人力资源来设计启发式算法、构建功能并实现复杂的特定领域解决方案。相比之下,生成式人工智能领域的最新突破——例如 Llama 3、GPT-4o、Grok 3、DeepSeek-V3 和 DeepSeek-R1——为推理和多模态机器智能开辟了令人兴奋的新途径。而且毋庸置疑,这仅仅是个开始!近期研究表明,这些模型能够很好地泛化到文本任务之外的其他任务,并在以下方面表现出色:
The journey from traditional approaches to LLMs can be traced through a few groundbreaking shifts. Traditional approaches consumed extensive human resources to design heuristics, engineer features, and implement complex domain-specific solutions. In contrast, recent breakthroughs in generative AI—such as Llama 3, GPT-4o, Grok 3, DeepSeek-V3, and DeepSeek-R1—have opened exciting new avenues in reasoning and multimodal machine intelligence. And make no mistake—this is just the beginning! Recent research highlights how these models can generalize well beyond text-based tasks, excelling in the following:
近期研究表明,在精心设计的提示或轻量级微调的引导下,低阶线性模型(LLM)甚至可以在城市尺度、长周期轨迹预测方面超越专业模型。本章将使用 GPT-4o,展示无需任何额外微调,仅通过零样本提示即可取得的显著成果。
Recent research has shown that LLMs, when guided by carefully crafted prompts or lightweight fine-tuning, can even surpass specialized models in city-scale, long-horizon trajectory prediction. In this chapter, we’ll demonstrate effective results with zero-shot prompting—without any additional fine-tuning—using GPT-4o.
然而,为了清楚地理解这一充满希望的方向,我们首先来考察一下这篇论文的主要贡献:为本章内容奠定了基础。
To understand this promising direction clearly, however, let’s first examine the key contributions of the paper that served as a basis for this chapter.
由唐鹏、杨超、邢涛、徐晓、姜瑞和濑崎健(2024)组成的团队通过三个方面将法学硕士(LLM)提升到了新的水平。关键性创新。
It took a team consisting of Tang, P., Yang, C., Xing, T., Xu, X., Jiang, R., and Sezaki, K. (2024) to take LLMs to the next level through three pivotal innovations.
作者并没有将原始坐标序列直接输入到标准的回归或分类模型中,而是将输入转换为……问题包含以下内容:
Instead of passing raw coordinate sequences into a standard regression or classification model, the authors transform the input into a question that includes the following:
这种问答形式充分利用了LLM(学习语言硕士)固有的阅读指令和生成结构化输出的能力。
This Q&A style leverages the LLM’s inherent ability to read instructions and produce structured outputs.
然后,他们对LLM进行了微调。
Then, they fine-tuned the LLM.
指令调整是一种技术,通过精心设计的提示和答案来微调学习学习模型(LLM),使其能够更好地理解和运用这些提示和答案。生成特定领域的输出,同时仍保留其通用的语言推理能力。作者展示了即使仅使用移动数据集的一小部分,也能实现这一点。经过微调,该模型仍然可以推广到新用户或新城市。在我们的案例中,即使没有数据集,我们也获得了可以接受的结果。
Instruction tuning is a technique where the LLM is fine-tuned with carefully designed prompts and answers, teaching it to produce domain-specific outputs while still retaining its general language reasoning capabilities. The authors showcase that even if you use only a fraction of the mobility dataset for fine-tuning, the model can still generalize to new users or new cities. In our case, we attained acceptable results without a dataset.
令人惊讶的是,正如我们在“构建轨迹模拟和预测函数”部分构建 Python 程序时将看到的,即使采用零样本、无微调的方法,我们也能取得很好的结果,利用 GPT-4o 出色的推理能力,而无需任何特定领域的微调数据。
Surprisingly enough, as we will see when we build the Python program in the Building the trajectory simulation and prediction function section, we achieve strong results even with a zero-shot, no-fine-tuning approach, leveraging GPT-4o’s exceptional reasoning capability without needing any domain-specific fine-tuning data.
随后,移动性研究团队解决了数据缺失的问题。
The mobility research team then solved the issue of missing data.
出行方面常见的挑战数据集中存在缺失坐标,通常用占位符值(例如 0)标记999。基于 LLM 的系统专门负责填充这些空白,有效地执行时空插补。当然,这种方法并非没有局限性,我们将在运行移动性模拟时通过实际示例清楚地说明这些局限性。但在探讨这些局限性之前,让我们先深入了解一下解决方案的构建。
A common challenge in mobility datasets is the presence of missing coordinates, typically marked with placeholder values such as 999. The LLM-based system is tasked explicitly with filling in these gaps, effectively performing spatiotemporal imputation. Naturally, this approach is not without limitations, which we’ll clearly illustrate through practical examples when we run our mobility simulation. But before exploring these boundaries, let’s first dive into building our solution.
下一节中,我们将使用 OpenAI 模型开发一个轨迹(移动性)仿真和分析组件。然后,我们将把这个移动性功能集成到GenAISys 的第 3 层,如图8.1中的函数F4.1所示。我们还将更新第 2 层,以注册处理程序,并确保它可以在第 1 层的 IPython 接口级别激活。
In the next section, we will develop a trajectory (mobility) simulation and analysis component using OpenAI models. We will then integrate this mobility function into Layer 3 of our GenAISys, as illustrated in Figure 8.1 with function F4.1. We will also update Layer 2 to register the handler and ensure it can be activated at the IPython interface level in Layer 1.
图 8.1:轨迹仿真与预测的整合
Figure 8.1: Integrating trajectory simulations and predictions
一旦完成轨迹模拟预测组件已集成到我们的 GenAISys 系统中,可应用于配送和各种与移动性相关的任务。我们将首先模拟为在线旅行社的客户配送定制礼品(例如品牌包、T 恤和小册子),然后再探索其他潜在应用。现在,让我们构建轨迹仿真模型!
Once the trajectory simulation and prediction component is integrated into our GenAISys, it can be applied to deliveries and a wide range of mobility-related tasks. We will start by modeling the delivery of customized goodies—such as branded bags, T-shirts, and booklets—for customers of an online travel agency, and then explore other potential applications. For now, let’s build our trajectory simulation!
此目标本节旨在创建稳健的轨迹仿真模型,准备预测函数,并运行 OpenAI LLM 来分析合成轨迹数据并预测缺失的坐标。稍后,在“为 GenAISys 添加移动智能”一节中,我们将把这些功能集成到我们全面的 GenAISys 框架中。
The goal of this section is to create a robust trajectory simulation, prepare the predictive functions, and run an OpenAI LLM to analyze synthetic trajectory data and predict missing coordinates. Later, in the Adding mobility intelligence to the GenAISys section, we’ll integrate this into our comprehensive GenAISys framework.
打开1_Trajectory_simulation_and_prediction.ipynbGitHub 上 Chapter08 目录下的 notebook 文件(https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main)。初始设置与环境配置一致Chapter07/GenAISys_DeepSeek.ipynb,包括以下内容:
Open the 1_Trajectory_simulation_and_prediction.ipynb notebook within the Chapter08 directory on GitHub (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main). The initial setup mirrors the environment configuration in Chapter07/GenAISys_DeepSeek.ipynb and includes the following:
我们将分三个主要步骤构建该程序,如图8.2所示:
We will build the program in three main steps, as shown in Figure 8.2:
在“向 GenAISys添加移动智能”部分中,我们将把移动性协调器添加到 GenAISys 的处理程序注册表中,并在通过 IPython 接口激活时,由处理程序选择机制进行管理。在本节中,我们将直接调用移动性协调器。
The mobility orchestrator will be added to the handlers registry in our GenAISys in the Adding mobility intelligence to the GenAISys section and managed by the handler selection mechanism when activated by the IPython interface. In this section, we will call the mobility orchestrator directly.
图 8.2阐明了移动性协调器、轨迹模拟器和生成式人工智能预测器之间的关系。这种智能体的组合与轨迹分析和预测框架保持紧密一致。
Figure 8.2 articulates the relationship between the mobility orchestrator, the trajectory simulator, and the generative AI predictor. This mixture of agents maintains close alignment with the framework of trajectory analysis and predictions.
图 8.2:移动性协调器的功能
Figure 8.2: The functions of the mobility orchestrator
我们将首先创建轨迹模拟。
We will first begin by creating the trajectory simulation.
Tang等人撰写的参考论文展示了如何通过指令调整LLM来填充缺失的轨迹坐标。并预测网格化城市地图中的未来位置。需要注意的是,在本文中,我们将利用 OpenAI API 消息对象的强大功能,在论文框架内,通过实时零样本提示实现有效的结果。
The reference paper by Tang et al. demonstrates how an LLM can be instruction-tuned to fill missing trajectory coordinates and predict future positions in a grid-based city map. Note that in our case, we will leverage the power of the OpenAI API message object to achieve an effective result with zero-shot prompts in real time, within the framework of the paper.
他们的方法论中的一个重要步骤是拥有(日期、时间段、x、y)记录,其中一些坐标可能缺失(例如,999, 999),以表示未知位置。
One important step in their methodology involves having (day, timeslot, x, y) records, with some coordinates possibly missing (e.g., 999, 999) to indicate unknown positions.
我们将要编写的函数create_grid_with_trajectory(),本质上模拟了一个规模较小的版本。按照以下步骤操作即可:
The function that we will write, create_grid_with_trajectory(), essentially simulates a smaller-scale version of this scenario by doing the following:
(999, 999)),以模拟现实世界的数据缺失。(999, 999)) to simulate real-world data gaps.这种合成生成方法对于测试或概念验证演示非常有用,这与本文的精神不谋而合:
This kind of synthetic generation is useful for testing or proof-of-concept demos, echoing the spirit of the paper:
999, 999),LLM 或其他模型稍后可以尝试填充这些缺失值。999, 999), which the LLM or another model can later attempt to fill in现在让我们一步一步地了解轨迹模拟功能:
Let’s now go through the trajectory simulation function step by step:
def create_grid_with_trajectory(
grid_size=200, num_points=50, missing_count=5
):
grid = np.zeros((grid_size, grid_size), dtype=int)
trajectory = []
def create_grid_with_trajectory(
grid_size=200, num_points=50, missing_count=5
):
grid = np.zeros((grid_size, grid_size), dtype=int)
trajectory = []
参数如下:
The parameters are as follows:
grid_size=200:网格沿一个轴的尺寸(因此网格为 200×200)grid_size=200: The size of the grid along one axis (so the grid is 200×200)num_points=50将生成多少个轨迹点(或步数)num_points=50: How many trajectory points (or steps) will be generatedmissing_count=5:其中有多少个点会被故意转换为缺失坐标(999, 999)missing_count=5: How many of those points will be deliberately turned into missing coordinates (999, 999)grid = np.zeros((grid_size, grid_size), dtype=int)创建一个二维零数组(类型为 0 int)。可以将其视为grid[x][y]该单元格的状态,初始值为 0。trajectory = []:将保存形如(日期、时间段、x、y)的元组。grid = np.zeros((grid_size, grid_size), dtype=int) creates a two-dimensional array of zeros (of the int type). Think of grid[x][y] as the status of that cell, initially 0.trajectory = []: will hold tuples of the form (day, timeslot, x, y).这与论文中的离散化城市概念相呼应,其中每个(x, y)单元格可能代表城市内的一个区域。
This mirrors the discretized city concept in the paper, where each (x, y) cell might represent a zone within the city.
x = random.randint(0, grid_size - 1)
y = random.randint(0, grid_size - 1)
day = random.randint(1, 365)
timeslot = random.randint(0, 47)
x = random.randint(0, grid_size - 1)
y = random.randint(0, grid_size - 1)
day = random.randint(1, 365)
timeslot = random.randint(0, 47)
directions = [(0, 1), (1, 0), (0, -1), (-1, 0)]
current_dir_index = random.randint(0, 3)
turn_weights = {-1: 0.15, 0: 0.70, 1: 0.15}
directions = [(0, 1), (1, 0), (0, -1), (-1, 0)]
current_dir_index = random.randint(0, 3)
turn_weights = {-1: 0.15, 0: 0.70, 1: 0.15}
该结构是一个经典的移动代理框架:
This structure is a classical mobility agent framework:
directions:表示四个可能的方向——上、右、下、左。directions: Represents four possible directions—up, right, down, and left.current_dir_index:选择智能体最初面向的四个方向中的哪一个。current_dir_index: Picks which of the four directions the agent faces initially.turn_weights:概率分布决定了智能体在每一步中左转(-1)、直行(0)或右转(1)的可能性。在本例中,左转的概率为 15%,直行的概率为 70%,右转的概率为 15%。这为智能体的移动引入了随机性,是对人类或类智能体移动模式的一种简单近似。turn_weights: Probability distribution dictating how likely the agent is to turn left (-1), go straight (0), or turn right (1) at each step. In our case, there is a 15% chance of turning left, a 70% chance of continuing, and a 15% chance of turning right. This introduces randomness in how the agent moves and is a simple approximation of human or agent-like mobility patterns. for _ in range(num_points):
turn = random.choices(list(turn_weights.keys()),
weights=list(turn_weights.values()))[0]
current_dir_index = (current_dir_index + turn) % \
len(directions)
dx, dy = directions[current_dir_index]
new_x = x + dx
new_y = y + dy
...
trajectory.append((day, timeslot, x, y))
grid[x, y] = 1
timeslot = (timeslot + random.randint(1, 3)) % 48
for _ in range(num_points):
turn = random.choices(list(turn_weights.keys()),
weights=list(turn_weights.values()))[0]
current_dir_index = (current_dir_index + turn) % \
len(directions)
dx, dy = directions[current_dir_index]
new_x = x + dx
new_y = y + dy
...
trajectory.append((day, timeslot, x, y))
grid[x, y] = 1
timeslot = (timeslot + random.randint(1, 3)) % 48
让我们一起来看看虚拟代理的行动:
Let’s go through the actions of our virtual agent:
turn_weights,智能体随机决定是继续沿同一方向行驶、左转还是右转。turn_weights, the agent randomly decides whether to continue in the same direction, turn left, or turn right.dx, dy are the increments along x and y for the chosen direction.(new_x, new_y), is computed.
(day, timeslot, x, y) is appended to trajectory.grid[x, y] as 1, signifying a visited cell.timeslot = (timeslot + random.randint(1, 3)) % 48时间段从 1 步跳到 3 步,保持在[0, 47]。timeslot = (timeslot + random.randint(1, 3)) % 48: The timeslot jumps from 1 to 3 steps, staying in [0, 47]. missing_indices = random.sample(range(len(trajectory)),
min(missing_count,
len(trajectory)))
for idx in missing_indices:
d, t, _, _ = trajectory[idx]
trajectory[idx] = (d, t, 999, 999)
missing_indices = random.sample(range(len(trajectory)),
min(missing_count,
len(trajectory)))
for idx in missing_indices:
d, t, _, _ = trajectory[idx]
trajectory[idx] = (d, t, 999, 999)
缺失点的确定分两步进行:
The missing points are determined in two steps:
missing_count点。num_pointsmissing_count points from the total num_points of the trajectory.(x, y)为999, 999。(x, y) with 999, 999.在论文中,作者将 定义999, 999为 LLM 后续需要填充的未知或缺失坐标的信号。这段代码片段正是模拟了这种情况——某些坐标缺失,需要进行插补或预测步骤。
In the paper, the authors define 999, 999 as the signal for unknown or missing coordinates that the LLM must later fill in. This code snippet simulates exactly that scenario—some coordinates go missing, requiring an imputation or prediction step.
接下来,我们希望添加一个可视化功能,帮助用户查看轨迹及其缺失点。
We want to add a visualization function next that will help the user to see the trajectory and its missing points.
We will plot the grid and trajectory in Matplotlib:
x_coords = [x if x != 999 else np.nan for _, _, x, y in trajectory]
y_coords = [y if y != 999 else np.nan for _, _, x, y in trajectory]
plt.figure(figsize=(8, 8))
plt.plot(x_coords, y_coords, marker='o', linestyle='-',
color='blue', label="Agent Trajectory")
...
plt.quiver(...)
...
plt.title("Agent Trajectory with Direction Arrows and Missing Data")
plt.xlabel("X coordinate")
plt.ylabel("Y coordinate")
plt.grid(True)
plt.legend()
plt.savefig("mobility.png")
plt.close()
Let’s go through the visualization process:
999, 999值转换为np.nanMatplotlib 能够断线而不是在视觉上连接它们的格式。999, 999 values into np.nan so that Matplotlib will break the line and not connect them visuallyplt.quiver)表示从每一点到下一点的方向。'X'缺失的数据点用品红色标记突出显示。plt.quiver) show the direction from each point to the next.'X' marker in magenta.mobility.pngmobility.png这种绘图方式与论文案例研究部分(4.4 节)的风格一致,该部分作者比较了实际轨迹和预测轨迹。在这里,您只需展示合成路径以及缺失数据的视觉指示即可。
Such plotting mirrors the style in the paper’s Case Study section (Section 4.4), where the authors compare real versus predicted trajectories. Here, you’re simply illustrating the synthetic path as well as the visual indications of missing data.
The output of the function that we will process contains the grid and the trajectory:
return grid, trajectory
这两个变量将包含我们的生成式人工智能模型进行预测所需的信息:
These two variables will contain what our generative AI model needs to make a prediction:
grid:一个二维数组,用于标记已访问的路径grid: A two-dimensional array marking the visited pathtrajectory:一个(日期、时间段、x、y)元组列表,其中一些已被替换为999, 999trajectory: A list of (day, timeslot, x, y) tuples, with some replaced with 999, 999最终结果将输入到基于LLM的方法(例如论文中描述的方法)中,该方法使用OpenAI生成式AI模型,能够在零样本过程中生成可接受的输出。现在我们将开始处理轨迹模拟。
This final result will be fed into an LLM-based approach (such as the one described in the paper) with an OpenAI generative AI model that can produce an acceptable output in a zero-shot process. We will now begin to process the trajectory simulation.
轨迹模拟有生成了网格、轨迹以及轨迹中缺失的坐标。接下来,我们将开发一个协调器函数,该函数集成了轨迹模拟和 OpenAI 模型的预测功能。我们将这个协调器称为handle_mobility_orchestrator()。
The trajectory simulation has generated the grid, the trajectory, and the missing coordinates in the trajectory. We will now develop the orchestrator function that integrates both the trajectory simulation and the predictive capabilities of the OpenAI model. We’ll call this orchestrator handle_mobility_orchestrator().
该协调器与 Tang 等人 (2024) 在其论文《指令调优 Llama-3-8B 在城市级交通预测中表现出色》中概述的方法一致。其目的简单而强大,执行三个关键功能:
This orchestrator aligns with the method outlined by Tang et al. (2024) in their paper Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction. Its purpose is straightforward yet powerful, performing three critical functions:
create_grid_with_trajectory ()函数来模拟可能存在缺失点的轨迹。create_grid_with_trajectory () function to simulate a trajectory with possible missing pointsreason.mobility_agent_reasoning_thread())reason.mobility_agent_reasoning_thread())reasoning_steps),包括新生成的轨迹数据和 LLM 推理步骤。reasoning_steps), to include both the newly generated trajectory data and the LLM reasoning steps这种方法与《指令调整 Llama-3-8B 在城市规模移动性预测中表现出色》一文中的思路一致,该文的作者强调创建结构化的输入数据(例如带有缺失点的轨迹),然后将其传递给 LLM 进行补全或预测。
This approach remains true to the Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction paper, where the authors emphasize creating structured input data—such as trajectories with missing points—and then passing it to an LLM for completion or prediction.
现在让我们一步一步地了解一下编排器:
Let’s now go through the orchestrator step by step:
def handle_mobility_orchestrator(
muser_message1, msystem_message_s1, mgeneration,
mimcontent4, mimcontent4b
):
def handle_mobility_orchestrator(
muser_message1, msystem_message_s1, mgeneration,
mimcontent4, mimcontent4b
):
它立即调用我们之前构建的轨迹模拟函数:
Immediately, it invokes the trajectory simulation function we built previously:
grid, trajectory = create_grid_with_trajectory(
grid_size=200, num_points=50, missing_count=5
)
trajectory_json = json.dumps({"trajectory": trajectory}, indent=2)
#print("Trajectory Data (JSON):\n", trajectory_json)
muser_message = f"{muser_message1}\n\nHere is the trajectory data:\n{trajectory_json}"
trajectory_json = json.dumps({"trajectory": trajectory}, indent=2)
#print("Trajectory Data (JSON):\n", trajectory_json)
muser_message = f"{muser_message1}\n\nHere is the trajectory data:\n{trajectory_json}"
这段代码负责转换轨迹并增强用户消息:
This code takes care of converting the trajectory and augmenting the user message:
trajectory_json成为数据的序列化版本,以便可以将其嵌入到短信或 API 调用中。{"trajectory": [...list of (day, timeslot, x, y)...]}……trajectory_json becomes a serialized version of the data so it can be embedded in text messages or API calls.{"trajectory": [...list of (day, timeslot, x, y)...]}.(muser_message1)并将新生成的轨迹数据附加到该消息中。(muser_message1), and appends the newly generated trajectory data to it.这一步骤紧随其后这与 Tang 等人 (2024) 提出的问答式交互方式类似,其中轨迹数据(用占位符 ( 999, 999) 明确标记)直接传递给模型。
This step closely mirrors the Q&A-style interaction presented by Tang et al. (2024), where the trajectory data—marked clearly by placeholders (999, 999)—is delivered directly to the model.
reasoning_steps = reason.mobility_agent_reasoning_thread(
muser_message, msystem_message_s1, mgeneration,
mimcontent4, mimcontent4b
)
reasoning_steps = reason.mobility_agent_reasoning_thread(
muser_message, msystem_message_s1, mgeneration,
mimcontent4, mimcontent4b
)
以下是幕后发生的事情:
Here’s what happens behind the scenes:
reason.mobility_agent_reasoning_thread(...)通过选定的LLM(例如GPT-4o)处理移动性预测逻辑reason.mobility_agent_reasoning_thread(...) processes the mobility prediction logic through the selected LLM (such as GPT-4o)msystem_message_s1,,,和)代表了生成式mgeneration人工智能模型的明确指令和具体上下文,指导其推理和预测。mimcontent4mimcontent4bmsystem_message_s1, mgeneration, mimcontent4, and mimcontent4b) represent clear instructions and specific context for the generative AI model, guiding its reasoning and predictions这与 Tang 等人的论文中描述的方法类似,该模型接收结构化的输入数据,并被提示推断缺失的轨迹或预测下一步的移动。
This mirrors the approach described in Tang et al.’s paper, where the model receives structured input data and is prompted to infer missing trajectories or forecast next movements.
reasoning_steps.insert(
0, ("Generated Trajectory Data:", trajectory)
)
return reasoning_steps
reasoning_steps.insert(
0, ("Generated Trajectory Data:", trajectory)
)
return reasoning_steps
接下来,我们来开发处理程序注册表将调用的 AI 推理函数。
Next, let’s develop the AI reasoning function that the handler registry will call upon.
在本节中,我们将开发使 GenAISys 能够处理与移动性相关的用户消息的功能。具体来说,我们将实现一个名为“handle_mobility(user_message)无缝集成到我们人工智能功能中的函数”的功能。GenAISys。
In this section, we’ll develop the function that allows our GenAISys to process mobility-related user messages. Specifically, we’ll implement a function named handle_mobility(user_message) that integrates seamlessly into the AI functions of our GenAISys.
我们将分两部分来完成这项任务:
We’ll approach this task in two main parts:
这与《指导调整 Llama-3-8B 在城市规模出行预测中表现出色》中描述的轨迹完成方法非常吻合,其中结构化提示显著提高了预测准确性。
This aligns closely with the trajectory completion methodology described in Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction, where structured prompts significantly enhance predictive accuracy.
We have four main message variables to send to the OpenAI function:
msystem_message_s1系统消息msystem_message_s1: System messagemgeneration:生成消息mgeneration: Generation messagemimcontent4补充背景信息mimcontent4: Additional contextmuser_message1用户消息muser_message1: User message它们各自在发送给LLM(GPT-4o或类似模型)的最终预测提示中发挥着不同的作用。系统消息将为预测任务做好准备。
They each serve a distinct purpose in the final prompt that goes to the LLM (GPT-4o or similar) for the prediction task. The system message will set the stage for the task.
系统消息设置LLM的整体上下文和约束,确保模型清楚地理解其主要目标。系统消息存储在[此处应填写存储msystem_message_s1位置]。我们首先指定模型的角色:
The system message sets the overall context and constraints for the LLM, ensuring the model clearly understands its main objectives. The system message is stored in msystem_message_s1. We first specify the role of the model:
msystem_message_s1 = """
You are GPT-4o, an expert in grid-based mobility analysis. Your task is to analyze the provided trajectory dataset and **identify missing coordinates** flagged as `999,999`, then predict their correct values.
现在,我们用清晰自然的语言详细描述预期任务:
Now, we clearly detail the tasks expected in explicit natural language:
**Task:**
1. **Process only the dataset provided in the user input. Do not generate or use your own sample data.**
2. Identify **every single** instance where `x` or `y` is `999`, including consecutive and scattered occurrences.
3. Predict the missing coordinate values based on the trajectory pattern.
4. **Do not modify, reorder, or filter the data in any way**—your response must reflect the dataset exactly as given except for replacing missing values.
5. Before responding, **validate your output** against the original dataset to confirm completeness and accuracy.
6. Maintain the exact order of missing values as they appear in the dataset.
7. Include a debugging step: **first print the list of detected missing values before structuring the final JSON output**.
输出格式已指定:
The output format is specified:
**Output Format:**
```json
{"predicted_coordinates": [[day, timeslot, x, y], ...]}
```
这些说明与上述说明相对应我们正在实现论文中提出的方法 ——系统消息阐明了模型和任务指令的作用,从而有效减少了混淆或错觉。论文展示了结构良好的指令块如何显著提高准确率。现在,我们可以构建生成消息了。
These instructions mirror the approach of the paper we are implementing—the system message clarifies the role of the model and the task instructions, effectively reducing confusion or hallucination. The paper shows how a well-structured instruction block significantly boosts accuracy. Now, we can build the generation message.
This secondary prompt provides generation instructions that will reinforce how the model should handle the data:
mgeneration = """
Scan the user-provided trajectory data and extract **every** point where either `x` or `y` equals `999`.
You must process only the given dataset and not generate new data.
Ensure that all missing values are explicitly listed in the output without skipping consecutive values, isolated values, or any part of the dataset. **Before responding, verify that all occurrences match the input data exactly.**
Then, predict the missing values based on detected trajectory movement patterns. **Provide a corrected trajectory with inferred missing values.**
To assist debugging, **first print the detected missing values list as a pre-response validation step**, then return the structured JSON output.
"""
此提示侧重于扫描缺失值,确保不遗漏任何值。然后,它会说明下一步:提供修正后的轨迹,并推断出缺失值。
This prompt focuses on scanning for missing values, ensuring none are skipped. Then, it addresses the next step: provide the corrected trajectory with inferred missing values.
为了确保我们获得什么我们希望,现在将添加更多上下文。这些额外上下文的作用是为系统/生成消息补充特定领域的上下文:
To make sure we obtain what we wish, we will now add additional context. The role of this additional context is to supplement the system/generation messages with domain-specific context:
mimcontent4 = """
This dataset contains spatial-temporal trajectories where some coordinate values are missing and represented as `999,999`. Your goal is to **identify these missing coordinates from the user-provided dataset only**, then predict their correct values based on movement patterns. Ensure that consecutive, isolated, and scattered missing values are not omitted. **Before generating the final response, validate your results and confirm that every missing value is properly predicted.**
"""
这些额外的上下文信息能够进一步引导生成式人工智能模型做出更精准的预测。接下来,我们将设计一条用户消息,以进一步指导模型。
This additional context further guides the generative AI model toward precise predictions. We will now engineer a user message to further instruct the model.
现在需要进一步强调说明,确保我们为输入提供更多上下文信息。用户消息它表达了用户的明确请求。它引用了包含缺失点的实际数据集。实际上,在你的代码中,你会在将实际轨迹数据(带有999, 999占位符)传递给生成式 AI 模型之前,先将其附加或嵌入进去:
It’s time to emphasize the instructions further to make sure we provide even more context to the input. The user message expresses the user’s explicit request. It references the actual dataset with missing points. Realistically, in your code, you’ll append or embed the actual trajectory data (with 999, 999 placeholders) before passing it to the generative AI model:
muser_message1 = """
Here is a dataset of trajectory points. Some entries have missing coordinates represented by `999,999`.
You must process only this dataset and **strictly avoid generating your own sample data**.
Please identify **all occurrences** of missing coordinates and return their positions in JSON format, ensuring that no values are skipped, omitted, or restructured. Then, **predict and replace** the missing values using trajectory movement patterns.
Before returning the response, **first output the raw missing coordinates detected** as a validation step, then structure them into the final JSON output with predicted values.
"""
让我们把信息整合起来。
Let’s fit the message together.
The four messages converge to direct the generative AI model:
msystem_message_s1)设定并执行顶级策略msystem_message_s1) sets the stage and enforces top-level policiesmgeneration)阐明了扫描、验证和预测的方法。mgeneration) clarifies the approach for scanning, verifying, and predictingmimcontent4)确保领域清晰度mimcontent4) ensures domain claritymuser_message1)包含了需要处理的数据(部分或缺失的轨迹)。muser_message1) includes the data that needs to be processed (the partial or missing trajectory)它们共同构成了零样本高级生成模型预测的结构。
Together, they form the structure of a zero-shot advanced generative model’s prediction.
现在,让我们把这条消息导入到 OpenAI API 函数中。这些消息会被存储起来commons/cot_messages_c6.py,以便 OpenAI API 函数导入。
Now, let’s fit the message into the OpenAI API function. These messages are stored in commons/cot_messages_c6.py to be imported by the OpenAI API function.
接下来我们将创建一个人工智能当我们将 GenAISys 集成到AI 功能部分时,其移动性功能如下:
We will now create an AI mobility function for the AI function section in our GenAISys when we integrate it:
def handle_mobility(user_message):
现在我们将导入存储在以下位置的消息cot_messages_c6.py:
We will now import the messages we stored in cot_messages_c6.py:
from cot_messages_c6 import (
msystem_message_s1, mgeneration, mimcontent4,muser_message1
)
现在我们将完善该函数,以便我们可以通过将消息插入生成式 AI 调用中来进一步调用它,并返回推理步骤:
We’ll now complete the function so that we can call it further in this program by plugging the messages in the generative AI call and return the reasoning steps:
#call Generic Synthetic Trajectory Simulation and Predictive System
reasoning_steps = handle_mobility_orchestrator(
muser_message1, msystem_message_s1, mgeneration,
mimcontent4, mimcontent4b
)
return reasoning_steps
mimcontent4b=mimcontent4
现在我们可以调用移动性协调器并返回其推理步骤:
We can now call the mobility orchestrator and return its reasoning steps:
#call Generic Synthetic Trajectory Simulation and Predictive System
reasoning_steps = handle_mobility_orchestrator(
muser_message1, msystem_message_s1, mgeneration,
mimcontent4, mimcontent4b)
return reasoning_steps
接下来,我们将在本书前几章中实现的库handle_mobility_orchestrator中创建该函数。首先,我们创建该函数:reason.py
We then create the handle_mobility_orchestrator function in the reason.py library we have been implementing in the previous chapters of this book. We first create the function:
# Implemented in Chapter08
def mobility_agent_reasoning_thread(
input1,msystem_message_s1,mumessage4,mimcontent4,mimcontent4b
):
然后,我们初始化推理步骤,并将它们显示出来VBox:
Then, we initialize the reasoning steps to display them in VBox:
steps = []
# Display the VBox in the interface
display(reasoning_output)
#Step 1: Mobility agent
steps.append("Process: the mobility agent is thinking\n")
with reasoning_output:
reasoning_output.clear_output(wait=True)
print(steps[-1]) # Print the current step
然后我们插入这些信息make_openai_call按照前几章中使用的标准进行操作,并返回以下步骤:
We then plug the messages received into the standard make_openai_call that we have been using in the previous chapters and return the steps:
mugeneration=msystem_message_s1 + input1
mrole4 = "system"
mcontent4 = mimcontent4
user_role = "user"
create_response = make_openai_api_call(
mugeneration,mrole4,mcontent4,user_role
)
steps.append(f"Customer message: {create_response}")
return steps
现在我们准备运行轨迹模拟和预测。
We are now ready to run the trajectory simulation and prediction.
凭借我们的移动功能构建并明确定义之后,我们现在可以运行完整的轨迹流程——生成合成轨迹数据、识别缺失坐标并使用零样本线性模型进行预测。本节将演示端到端的执行过程以及结果的解释。
With our mobility functions built and clearly defined, we can now run the complete trajectory pipeline—generating synthetic trajectory data, identifying missing coordinates, and predicting them with a zero-shot LLM. This section will demonstrate the end-to-end execution and interpretation of results.
我们将使用一个简单、通用的提示来启动移动性分析:
We’ll use a simple, generic prompt to initiate the mobility analysis:
user_message="Check the delivery path"
output=handle_mobility(user_message)
这将触发我们之前建立的整个流程,从合成数据生成到坐标预测。
This triggers the entire pipeline we set up previously, from synthetic data generation to coordinate predictions.
为了清晰地展示轨迹和缺失点,系统生成了一个可视化图(mobility.png)。我们可以直接显示该图像:
To clearly illustrate the trajectory and missing points, the system generates a visual plot (mobility.png). We can display this image directly:
# Display mobility.png if it exists and the "Mobility" instruction is selected
if os.path.exists("mobility.png"):
original_image = PILImage.open("mobility.png")
display(original_image)
The output contains the grid, the trajectory, and the missing data, as shown in Figure 8.3:
图 8.3:轨迹和缺失数据
Figure 8.3: Trajectory and missing data
输出结果以颜色、箭头和文本的形式绘制成图表:
The output is plotted with colors, arrows, and text as designs:
plt.quiver红色箭矢箭头( )表示从每个点到下一个点的方向plt.quiver) in red show the direction from each point to the nextThen, we print the raw output:
print(output)
输出结果为一行无结构化数据,其中包含轨迹数据和预测结果:
The output displayed is a single, unstructured line containing trajectory data and predictions:
[('Generated Trajectory Data:', [(50, 28, 999, 999), (50, ….
显然,我们需要以更直观的方式呈现这些数据。让我们创建一个函数来显示美观且格式化的响应:
Clearly, we need to present this data more intuitively. Let’s create a function to display a nice, formatted response:
def transform_openai_output(output):
"""
Takes the 'output' (a list/tuple returned by OpenAI) and transforms
it into a nicely formatted multiline string.
"""
这段代码将输出结果分成格式清晰的行:
The code breaks the output into well-presented lines:
…
lines = []
…
# Join all lines into one neatly formatted string
return "\n".join(lines)
然后我们调用该函数来获取格式化的输出:
We then call the function to obtain the formatted output:
pretty_response = transform_openai_output(output)
print(pretty_response)
输出结果包含我们构建的三步流程:
The output contains the three-step process we built:
输出结果首先包含轨迹:
The output first contains the trajectory:
Generated Trajectory Data:
(228, 6, 999, 999)
(228, 7, 69, 79)
(228, 9, 70, 79)
(228, 11, 71, 79)
(228, 13, 71, 78)
(228, 16, 71, 77)
(228, 18, 71, 76)
(228, 21, 71, 75)
(228, 24, 71, 74)
(228, 26, 70, 74)
(228, 27, 70, 73)
(228, 29, 70, 72)
(228, 32, 999, 999)
…
注意那些缺少x、y999坐标数据的记录。例如:
Note the records with missing data containing 999 for x,y coordinates. Take the following example:
(228, 6, 999, 999)
第二步是OpenAI GPT-4o 通过思考来找出缺失的数据并将其显示出来:
The second step is the OpenAI GPT-4o thinking through the problem to isolate the missing data and display it:
Process: the mobility agent is thinking
Customer message: **Detected Missing Coordinates:**
1. [228, 6, 999, 999]
2. [228, 32, 999, 999]
3. [228, 9, 999, 999]
4. [228, 45, 999, 999]
5. [228, 47, 999, 999]
第三步是让 OpenAI 的生成式人工智能预测缺失的数据:
The third step is for the OpenAI generative AI to predict the missing data:
**Predicted Missing Coordinates:**
输出结果将显示出来,并附有预测结果和解释:
The output is displayed and the predictions with explanations:
1. [228, 6, 69, 79] - Based on the trajectory pattern, the missing values at timeslot 6 are likely to be the same as the next known values at timeslot 7.
2. [228, 32, 69, 72] - Interpolating between timeslot 29 (70, 72) and timeslot 33 (68, 72), the missing values at timeslot 32 are predicted to be (69, 72).
3. [228, 9, 64, 72] - The missing values at timeslot 9 are interpolated between timeslot 7 (64, 71) and timeslot 10 (64, 73), resulting in (64, 72).
4. [228, 45, 58, 81] - Interpolating between timeslot 43 (58, 82) and timeslot 46 (58, 80), the missing values at timeslot 45 are predicted to be (58, 81).
5. [228, 47, 58, 79] - The missing values at timeslot 47 are interpolated between timeslot 46 (58, 80) and timeslot 1 (58, 78), resulting in (58, 79).
输出结果还包含 JSON 格式的预测数据:
The output also contains the predictions in JSON:
```json
{
"predicted_coordinates": [
[228, 6, 69, 79],
[228, 32, 69, 72],
[228, 9, 64, 72],
[228, 45, 58, 81],
[228, 47, 58, 79]
]
}
```
结果令人满意,表明最新的生成式人工智能模型具备零样本预测能力,能够对序列中缺失的数据进行预测。
The results are acceptable and show that recent generative AI models have zero-shot capabilities to make predictions on missing data in sequences.
然而,真正的威力在于将这些预测扩展到广泛的实际应用领域。接下来合乎逻辑的下一步是将此功能集成到我们的 GenAISys 界面中,使用户能够轻松自定义提示,以适应各种与轨迹相关的用例。
However, the real power lies in extending these predictions to a wide range of real-world applications. The next logical step is to integrate this functionality into our GenAISys interface, allowing users to customize prompts easily to suit diverse trajectory-related use cases.
让我们着手实施这项方便用户的集成方案。
Let’s move forward to implement this user-friendly integration.
现在,我们将把轨迹模拟和预测组件集成到 GenAISys 中,使用户能够设计特定领域的提示。在用户界面层面,我们将简化术语,例如“轨迹”。将“模拟和预测”简化为更易于理解的术语“移动性”。这个更简洁的标签更直观。对于用户而言,技术文档可以根据需要保留详细的术语。然后,用户将自行决定他们希望在界面中看到哪些特定领域的术语。
We will now integrate the trajectory simulation and prediction component into our GenAISys, allowing users to design domain-specific prompts. At the user interface level, we’ll simplify the terminology from “trajectory simulation and prediction” to the user-friendly term “mobility.” This shorter label is more intuitive for users, though technical documentation can maintain detailed terminology as required. Then it will be up to the users to decide what domain-specific terminology they wish to see in the interface.
我们将把内置的移动性功能添加到 GenAISys 的三个层级中,如图8.41_Trajectory_simulation_and_prediction.ipynb所示:
We will add the mobility function we built in 1_Trajectory_simulation_and_prediction.ipynb to the GenAISys at three levels, as shown in Figure 8.4:
图 8.4:将轨迹仿真和预测流程集成到 GenAISys 中
Figure 8.4: Integrating the trajectory simulation and prediction pipeline into the GenAISys
打开笔记本。如有需要,请先回顾第 7 章2_GenAISys_Mobility.ipynb中描述的处理程序选择机制,然后再继续。本笔记本并非设计用于语音输出坐标列表。因此,最好默认禁用 gTTS 功能(可通过笔记本顶部的选项禁用)。use_gtts = False
Open the 2_GenAISys_Mobility.ipynb notebook. If needed, review the handler selection mechanism described in Chapter 7 before continuing here. The notebook is not designed for voice outputs of lists of coordinates. As such, gTTS is best deactivated by default with use_gtts = False at the top of the notebook.
Let’s first enhance the IPython interface.
The mobility option is primarily added to these parts of the IPython interface:
instruct_selector,“移动性”是其中一个可选值instruct_selector dropdown with Mobility as one of its possible valuesupdate_display()会检查用户是否选择了“移动办公”,如果选择了,则显示该mobility.png文件。update_display(), which checks whether the user selected Mobility and, if so, displays the mobility.png filehandle_submission(),代码会打印出"Thinking..."是否instruct_selector.value为"Analysis"、"Generation"或。"Mobility"handle_submission(), where the code prints "Thinking..." if instruct_selector.value is "Analysis", "Generation", or "Mobility"mobility.png”小部件时,才会显示移动图像(即)。mobility.png) is only displayed when the Files widget is checked我们将首先在界面中添加该选项。我们将创建并添加一个选项,instruct_selector然后处理轨迹图像的显示和提交代码。让我们从界面中的选项开始。
We will begin by adding the option to the interface. We will create and add an option to instruct_selector and then handle the trajectory image display and submission code. Let’s begin with the option in the interface.
我们将首先添加移动性选项添加到“理由”下拉列表中,如图8.5所示:
We will first add the Mobility option to the Reasoning drop-down list, as illustrated in Figure 8.5:
instruct_selector = Dropdown(
options=["None", "Analysis", "Generation","Mobility"],
value="None",
description='Reasoning:',
layout=Layout(width='50%')
)
instruct_selector.observe(on_instruct_change, names='value')
图 8.5:向下拉菜单添加移动性
Figure 8.5: Adding Mobility to the dropdown
We can then select Mobility, as shown in Figure 8.6:
图 8.6:选择“移动性”以激活管道
Figure 8.6: Selecting Mobility to activate the pipeline
现在已选择移动性。请注意,默认模型设置为OpenAI;但是,您可以根据项目需求,在后续阶段将其扩展到其他模型,例如 DeepSeek。
Mobility is now selected. Notice the default model is set to OpenAI; however, you may extend this to other models, such as DeepSeek, during later phases, depending on your project needs.
现在让我们在更新显示时处理“移动性”值。
Let’s now handle the “mobility” value when we update the display.
我们必须确保生成的当选中“移动性”选项并启用“文件”复选框时,轨迹可视化(mobility.png)会自动显示:
We must ensure the generated trajectory visualization (mobility.png) is automatically shown when the Mobility option is selected and the Files checkbox is enabled:
def update_display():
clear_output(wait=True)
...
# Display c_image.png if it exists
if files_checkbox.value == True:
…
# Display mobility.png if "Mobility" is selected
if (
os.path.exists("mobility.png")
and instruct_selector.value == "Mobility"
):
original_image = PILImage.open("mobility.png")
display(original_image)
轨迹所形成的图像模拟结果将会显示。现在我们需要改进提交逻辑的输出,以便运行人工智能功能。
The image created by the trajectory simulation will be displayed. We now need to enhance the submission logic outputs to run the AI functions.
该chat_with_gpt函数调用方式与之前相同,但它直接与处理程序选择机制(下一节将对此进行描述)进行交互:
The chat_with_gpt function is called as before but it interacts directly with the handler selection mechanism (described in the next section):
The response = chat_with_gpt(
user_histories[active_user], user_message, pfiles,
active_instruct, models=selected_model
)
但是,我们将在提交处理功能中添加移动功能:
However, we will add the mobility functionality to the submission handling function:
def handle_submission():
user_message = input_box.value.strip()
…
if instruct_selector.value in [
"Analysis", "Generation","Mobility"
]:
with reasoning_output:
reasoning_output.clear_output(wait=True)
…
现在我们将向处理程序选择机制添加移动性功能。
We will now add the mobility function to the handler selection mechanism.
处理程序选择机制它包含两个主要部分。第一部分chat_with_gpt与前几章相同,由 IPython 接口直接调用:
The handler selection mechanism contains two main parts. The first component, chat_with_gpt, remains unchanged from previous chapters and is directly called by the IPython interface:
def chat_with_gpt(
messages, user_message, files_status, active_instruct, models
):
第二个组件是处理程序注册表,我们现在将向其中添加新开发的移动性处理程序:
The second component is the handler registry, to which we’ll now add the newly developed mobility handler:
handlers = [
…
# Mobility handler: determined by the instruct flag
(
lambda msg, instruct, mem, models, user_message, **kwargs:
instruct == "Mobility",
lambda msg, instruct, mem, models, user_message, **kwargs:
handle_mobility(user_message, models=models)
),
…
这样可以确保当用户在界面中的“推理”下拉菜单中选择“移动性”时,相应的处理程序会自动激活。我们可以看到,处理程序选择机制可以无缝衔接。已完成扩展。现在让我们把为这个移动功能开发的函数添加到人工智能函数库中。
This ensures that when users select Mobility from the Reasoning dropdown in the interface, the appropriate handler is activated automatically. We can see that the handler selection mechanism can be seamlessly scaled. Let’s now add the functions we developed for this mobility function to the AI functions library.
接下来,我们将对轨迹进行积分。将之前在“构建轨迹模拟和预测”部分开发的模拟和预测函数添加到笔记本中的 AI 函数库中:
Next, we’ll integrate the trajectory simulation and prediction functions—previously developed in the Building the trajectory simulation and prediction section—into the AI functions library within the notebook:
def create_grid_with_trajectory(
grid_size=200, num_points=50, missing_count=5
):
…
该函数被添加到处理程序选择机制调用的函数的开头上方。
This function is added just above the beginning of the functions called by the handler selection mechanism.
def handle_mobility_orchestrator(
muser_message1, msystem_message_s1, mgeneration,
mimcontent4, mimcontent4b
):
…
该函数还被添加到处理程序选择机制调用的函数的开头上方。
This function is also added just above the beginning of the functions called by the handler selection mechanism.
现在,我们添加handle_mobility我们开发的函数,并添加**kwargs处理程序机制选择函数发送的参数的处理功能:
We now add the handle_mobility function we developed as well, and add **kwargs to process the arguments sent by the handler mechanism selection function:
def handle_mobility(user_message, **kwargs):
from cot_messages_c6 import (
msystem_message_s1, mgeneration, mimcontent4,muser_message1
)
mimcontent4b=mimcontent4
#call Generic Synthetic Trajectory Simulation and Predictive System
reasoning_steps = handle_mobility_orchestrator(
muser_message1, msystem_message_s1, mgeneration,
mimcontent4, mimcontent4b
)
return reasoning_steps
这段代码的运行方式与“构建轨迹模拟和预测函数”部分中的完全相同。通过这种设置,移动功能已完全集成到 GenAISys 生态系统中,可通过直观的 IPython 界面触发。现在让我们让用户参与进来。
The code will run exactly like in the Building the trajectory simulation and prediction function section. With this setup, the mobility functionality is fully integrated into the GenAISys ecosystem, ready to be triggered via the intuitive IPython interface. Let’s now get the user involved.
在本节中,我们将通过运行两个不同的场景(配送使用)来演示增强移动性的 GenAISys。案例和火灾灾难场景——以说明轨迹模拟和预测的多功能性,灵感来自 Tang 等人 (2024) 的工作。
In this section, we will demonstrate the mobility-enhanced GenAISys by running two distinct scenarios—a delivery use case and a fire disaster scenario—to illustrate the versatility of trajectory simulations and predictions, inspired by the work of Tang et al. (2024).
打开2_GenAISys_Mobility.ipynb笔记本。首先,在初始设置单元格中禁用 DeepSeek(您只需要一个 CPU):
Open the 2_GenAISys_Mobility.ipynb notebook. First, deactivate DeepSeek in the initial setup cell (you will only need a CPU):
deepseek=False
HF=False
Togetheragents=False
然后运行整个笔记本。运行完毕后,转到笔记本中的“运行界面”部分。我们需要激活Agent、Files和Mobility,并将默认模型保留为OpenAI。
Then run the whole notebook. When it’s finished, go to the Running the interface section in the notebook. We need to activate Agent, Files, and Mobility, and leave the default model as OpenAI.
图 8.7:使用移动功能运行交付检查
Figure 8.7: Running a delivery check with the mobility function
合成轨迹模拟真实世界的输入数据,每次运行都会生成新的数据。本节中的说明仅针对其中一次运行。运行程序时,每次都会得到新的输出,从而模拟实时数据。
The synthetic trajectory simulates real-world input data and generates new data each time it runs. The explanations in this section reflect just one of these runs. When you execute the program, you’ll obtain a new output every time, simulating real-time data.
限制:目前,每当生成新轨迹时,原有轨迹文件都会被覆盖。如有需要,此功能可在项目开发过程中扩展,以保存多个文件。
Limit: Currently, the trajectory file is overwritten whenever a new trajectory is generated. If required, this functionality can be expanded during a project to save multiple files instead.
Let’s first explore the mobility function with a delivery example.
运行生产交付要进行验证,我们只需激活“代理”和“文件”,将“移动性”作为推理函数,并将OpenAI作为模型。然后,点击“发送”,让 AI 完成剩下的工作。在这个例子中,我们可以想象一家在线旅行社希望在客户出行前向他们提供定制礼品,例如印有客户姓名的个性化旅行包、旅游指南和博物馆参观折扣手册。
To run the production-delivery verification, we simply need to activate Agent and Files, Mobility as the reasoning function, and OpenAI as the model. Then, click on SEND and let the AI do the work. In this case, we can imagine that an online travel agency wants to deliver customized goodies to its customers before a trip, such as a personalized travel bag with their name printed on it, a tourist guide, and a discount booklet to visit museums.
GenAISys 将开始思考,如界面底部所示,如图8.8所示:
The GenAISys will begin to think, as displayed at the bottom of the interface, as shown in Figure 8.8:
图 8.8:GenAISys 已开始思考
Figure 8.8: The GenAISys has begun to think
输出结果首先显示合成轨迹,该轨迹可以来自任何来源,例如实时数据、数据库等。传感器。在这种情况下,合成轨迹会显示在指示讨论标题的用户消息下方,因为移动功能本身是自主的:
The output first displays the synthetic trajectory, which can come from any source, such as real-time data, databases, and sensors. The synthetic trajectory, in this case, is displayed under the user message that indicates the title of the discussion since the mobility function itself is autonomous:
User01: Check deliveries
Agent: ('Generated Trajectory Data:', [(145, 20, 999, 999), (145, 22, 189, 125), (145, 25, 190, 125), (145, 28, 190, 124), (145, 29, 190, 123), (145, 31, 999, 999),
然后,生成式人工智能功能接管,并表明它正在处理轨迹中缺失的数据:
Then, the generative AI function takes over and indicates that it is processing the missing data in the trajectory:
Process: The mobility agent is thinking
当程序完成思考后,它会列出缺失的数据:
When it is finished thinking, it provides the list of missing data:
Customer message: Detected Missing Coordinates:
1. [145, 20, 999, 999]
2. [145, 31, 999, 999]
3. [145, 34, 999, 999]
4. [145, 42, 999, 999]
5. [145, 3, 999, 999]
GenAISys随后给出其预测结果:
The GenAISys then provides its prediction:
Predicted Missing Coordinates: To predict the missing coordinates, we will analyze the trajectory pattern before and after each missing value:
Predicted Coordinates:
1. [145, 20, 189, 125] - Based on the pattern before and after the missing value.
2. [145, 31, 189, 122] - Interpolated from the surrounding points.
3. [145, 34, 189, 121] - Interpolated from the surrounding points.
4. [145, 42, 191, 128] - Based on the pattern before and after the missing value.
5. [145, 3, 190, 124] - Interpolated from the surrounding points.
提供结构化的 JSON 输出,以便与其他系统集成或进行进一步处理:
A structured JSON output is provided for integration with other systems or further processing:
json<br>{<br> "predicted_coordinates": [<br> [145, 20, 189, 125],<br> [145, 31, 189, 122],<br> [145, 34, 189, 121],<br> [145, 42, 191, 128],<br> [145, 3, 190, 124]<br> ]<br>}<br>
原始轨迹是如图 8.9所示,还会显示相关信息供用户做出决定:
The original trajectory is also displayed for the user to make decisions, as shown in Figure 8.9:
图 8.9:原始轨迹(含缺失数据)
Figure 8.9: The original trajectory with missing data
现在,三位领域专家(SME)代理人可以介入:
Now, three subject-matter expert (SME) agents can intervene:
现在让我们停用“移动性”功能和“文件”显示,以便让尖端的生成式人工智能发挥作用,如图8.10所示:
Let’s now deactivate the Mobility function and Files display to put the cutting-edge generative AI to work, as shown in Figure 8.10:
图 8.10:生成式人工智能接管并生成消息
Figure 8.10: Generative AI takes over and generates messages
在这个阶段,用户可以解读人工智能的输出结果,并制定个性化的提示。例如,考虑以下场景:
At this stage, a human user can interpret the AI’s output and craft a tailored prompt. For instance, consider this scenario:
The missing x,y coordinates with 999,999 respectively in this dialog are deliveries that were not made! The predicted x,y coordinates are inference from the coordinates before and after.
以下是一个针对特定领域的提示示例,用于解决此问题:
Here’s an example of a domain-specific prompt to address the issue:
Write an email to SUPPLIER COMPANY specifying that the missing coordinates from x values 20 to 40 are located in the Midtown mall area. Provide the missing coordinates and their predictions to inform the SUPPLIER COMPANY. Be as precise and explicit as possible.
如前所述,这样的该提示信息可以轻松应用于各种领域。只需调整坐标标签和目标收件人,即可生成各种各样的消息。具体范围和应用完全取决于您特定项目的需求。
As demonstrated, such a prompt could easily be adapted to various domains. By simply adjusting the labeling of the coordinates and the intended recipient, you could generate a diverse array of messages. The exact scope and application depend entirely on the requirements of your specific project.
生成式人工智能对前述提示的响应是可以接受且详尽的:
The generative AI’s response to the preceding prompt was acceptable and detailed:
Dear [Supplier Company Contact Name],
I hope this message finds you well. I am writing to inform you about certain discrepancies identified in the delivery data for the Midtown Mall area. Specifically, there are missing coordinates in the trajectory data, which indicate deliveries that were not completed. These missing coordinates are represented by the values 999,999.
Below, I have detailed the missing coordinates along with their predicted values, which have been inferred based on the surrounding data points:
1. Missing Coordinate: [145, 20, 999, 999]
- Predicted Coordinate: [145, 20, 189, 125]
2. Missing Coordinate: [145, 31, 999, 999]
- Predicted Coordinate: [145, 31, 189, 122]
3. Missing Coordinate: [145, 34, 999, 999]
- Predicted Coordinate: [145, 34, 189, 121]
4. Missing Coordinate: [145, 42, 999, 999]
- Predicted Coordinate: [145, 42, 191, 128]
5. Missing Coordinate: [145, 3, 999, 999]
- Predicted Coordinate: [145, 3, 190, 124]
These missing deliveries occurred within the x-value range of 20 to 40, specifically in the Midtown Mall area. The predicted coordinates have been calculated by analyzing the trajectory data pattern before and after the missing points.
We kindly request that you review this information and take the necessary actions to address these unfulfilled deliveries. Your prompt attention to this matter will help ensure that our delivery operations continue to run smoothly and efficiently.
Please feel free to reach out if you require further details or clarification regarding this issue.
Thank you for your cooperation.
Best regards,
[Your Full Name]
[Your Position]
[Your Company Name]
[Your Contact Information]
接下来,无需重新开始会话,让我们向 GenAISys 询问这种方法如何应用于火灾灾害。设想。
Next, without restarting the session, let’s ask our GenAISys how this methodology could be applied to a fire disaster scenario.
现在,让我们探讨一下如何利用轨迹分析、缺失坐标和预测坐标的概念。灾害预测情景,特别是诸如森林火灾之类的情景。我们将把这个情景提交给 GenAISys 并分析其结果:
Now, let’s explore how the concepts of trajectory analysis, missing coordinates, and predicted coordinates can be leveraged in disaster prediction scenarios, specifically in cases such as forest fires. We’ll submit this scenario to the GenAISys and analyze its insights:
User01: How could the trajectory, missing coordinates, and predicted coordinates be used for disaster prediction such as fires? Explain and create an output for this use case.
Agent: Trajectory data, missing coordinates, and predicted coordinates can be valuable tools in disaster prediction and management, such as for fires, in the following ways:
1. Trajectory Analysis: By analyzing the movement patterns of vehicles, drones, or other mobile units, authorities can identify areas that are frequently visited or monitored. This can help in understanding the normal patterns of activity and detecting anomalies that might indicate a disaster, such as a sudden deviation from a typical route due to a fire.
2. Missing Coordinates: Missing data points can indicate areas where monitoring or data collection was interrupted, possibly due to a disaster like a fire. Identifying these gaps can help in pinpointing locations that need immediate attention or further investigation.
3. Predicted Coordinates: By predicting the missing coordinates, authorities can fill in the gaps in the data to maintain a continuous monitoring system. This can help in reconstructing the events leading up to a disaster and in planning the response by understanding the spread and impact of the fire.
### Use Case Output for Disaster Prediction (Fires)
Scenario: A fire has been reported in a forest area, and trajectory data from drones monitoring the region is being analyzed to assess the situation.
Output:
json<br>{<br> "fire_prediction_analysis": {<br> "trajectory_analysis": "Drones deviated from their usual path at coordinates [145, 20] to [145, 40], indicating potential fire activity.",<br> "missing_coordinates": [<br> {"missing_coordinate": [145, 25, 999, 999], "predicted_coordinate": [145, 25, 190, 130]},<br> {"missing_coordinate": [145, 30, 999, 999], "predicted_coordinate": [145, 30, 191, 128]}<br> ],<br> "predicted_fire_area": [<br> {"coordinate": [145, 25, 190, 130]},<br> {"coordinate": [145, 30, 191, 128]}<br> ],<br> "actionable_insights": "Immediate investigation and firefighting efforts are recommended in the predicted fire area to prevent further spread."<br> }<br>}<br>
This output provides a structured analysis of the situation, helping disaster management teams to quickly identify and respond to potential fire threats based on trajectory data and inferred information.
这种结构化的输出提供了宝贵的见解,使灾害应对团队能够根据轨迹分析、精确的数据缺口和预测坐标,迅速识别和应对潜在威胁。
This structured output offers valuable insights, enabling disaster response teams to swiftly identify and respond to potential threats based on trajectory analysis, pinpointed data gaps, and predictive coordinates.
这种方法论表明我们可以跨领域创建众多专业化的提示。尽管存在不可避免的局限性,但 GenAISys 的时代才刚刚开始,它正不断扩展到新的、未知的应用领域。
This methodology demonstrates that we can craft numerous specialized prompts across domains. Despite inevitable limitations, the era of GenAISys is just beginning, continually expanding into new, uncharted applications.
本章首先指出,稳健的轨迹分析对于从物流配送、疫情预测到城市规划等诸多应用至关重要。在 Tang, P., Yang, C., Xing, T., Xu, X., Jiang, R., and Sezaki, K. (2024) 提出的创新方法的指导下,我们强调了基于文本的低层移动模型(LLM)在移动性预测方面的变革性潜力。他们的框架指导我们设计了一种能够通过精心设计的提示智能地填补实时合成数据集空白的方法。
In this chapter, we began by recognizing that robust trajectory analysis is essential for applications ranging from deliveries and epidemic forecasting to city-scale planning. Guided by the innovative approach outlined in Tang, P., Yang, C., Xing, T., Xu, X., Jiang, R., and Sezaki, K. (2024), we emphasized the transformative potential of text-based LLMs for mobility prediction. Their framework directed our design of a method capable of intelligently filling gaps in real-time synthetic datasets through carefully structured prompts.
我们随后构建了一个基于 Python 的轨迹模拟器,该模拟器在网格上随机生成移动轨迹,模拟典型的用户路径。它为轨迹分配了日期和时间段索引,从而使我们能够捕捉移动的时间特性。关键在于,我们插入了标记为 的合成间隙999, 999,以近似模拟现实世界中的数据缺失或日志丢失。接下来,我们集成了一个编排器函数,该函数在将这些合成数据传递给 LLM(在本例中为 OpenAI GPT-4o 模型)之前,会添加相应的指令。编排器会生成能够准确反映轨迹数据集的提示,并将模型的注意力集中在标记的间隙上。它采用了一种“链式推理”的例程,在生成最终 JSON 输出之前记录缺失点以便进行调试。
We then built a Python-based trajectory simulator that randomizes movement on a grid, mirroring typical user paths. It assigns day and timeslot indices, which enabled us to capture the temporal aspect of mobility. Critically, we inserted synthetic gaps marked as 999, 999, approximating real-world data dropouts or missing logs. Next, we integrated an orchestrator function that adds instructions with this synthetic data before directing them to an LLM, in this case, an OpenAI GPT-4o model. The orchestrator composes prompts that accurately reflect the trajectory dataset, focusing the model’s attention on flagged gaps. It employs a chain-of-thought routine, noting missing points for debugging prior to generating final JSON outputs.
随后,我们通过在多处理器系统中添加一个专用的移动性处理器,将该流程整合到 GenAISys 环境中。该处理器简化了整个流程:轨迹生成、模型推理和可视化全部集中在一个地方。用户可以提示系统评估缺失的坐标,并立即在静态城市网格上看到更新后的路径叠加图。最终,我们证明,只要基于目标明确、响应迅速的设计,强大的 GenAISys 预测就无需保持抽象。
We then merged this pipeline into the GenAISys environment by adding a dedicated mobility handler in the multihandler system. This handler streamlines the full process: trajectory generation, model inference, and visualization all in one place. Users can prompt the system to evaluate missing coordinates and instantly see the updated paths superimposed on a static city grid. Ultimately, we demonstrated that robust GenAISys forecasting need not remain abstract when grounded in purposeful, prompt design.
在下一章中,我们将通过外部服务向世界开放 GenAISys,这将引导我们通过安全性和审核功能来增强我们的系统。
In the next chapter, we will open the GenAISys to the world with an external service that will lead us to enhance our system with security and moderation functionality.
新框架、演进的架构、研究成果发布、生产故障——AI_Distilled每周都会为从事 LLM 和 GenAI 系统实战的工程师和研究人员提供一份简报,帮您过滤掉各种干扰信息。立即订阅,即可免费获得电子书,以及每周更新的深度见解,助您保持专注,掌握最新资讯。
New frameworks, evolving architectures, research drops, production breakdowns—AI_Distilled filters the noise into a weekly briefing for engineers and researchers working hands-on with LLMs and GenAI systems. Subscribe now and receive a free eBook, along with weekly insights that help you stay focused and informed.
订阅请访问https://packt.link/TRO5B或扫描下方二维码。
Subscribe at https://packt.link/TRO5B or scan the QR code below.
在本章中,我们将通过与现实世界的在线服务集成来开放我们的 GenAISys 系统——具体来说,是将其连接到在线天气 API。这将使我们在本书中一直支持的虚构在线旅行社能够访问实时天气数据。特定地点的天气报告将作为该旅行社各项重要运营任务的入口,例如市场营销活动、旅游活动推荐和产品交付协调。
In this chapter, we will open up our GenAISys by integrating it with real-world online services—specifically, by connecting it to an online weather API. This will enable the fictional online travel agency that we’ve been supporting throughout the book to access real-time weather data. Weather reports for a specific location serve as an entry point for various tasks essential to the agency’s operations, such as marketing initiatives, recommendations for tourist activities, and coordinating product deliveries.
将我们的 GenAISys 连接到外部在线资源,意味着我们的系统从受控的内部环境过渡到不可预测的实时数据交互领域。然而,这种过渡也带来了关键的安全隐患。在缺乏充分保护的情况下开放系统,可能会无意中泄露敏感数据或导致安全漏洞,从而对用户和组织本身构成真正的风险。因此,在全面集成外部服务之前,必须采取强有力的安全措施。本章提出了三重挑战:使用 OpenWeather API 实现天气服务;构建一个利用 OpenAI 审核功能的审核系统;以及使用 Pinecone 开发基于 RAG(红绿灯)的数据安全功能,以检测和防止敏感主题泄露。我们将充分利用现有 GenAISys 架构灵活强大的处理器选择机制,实现这些新功能的无缝集成。我们的目标始终明确——以最小的代码改动实现最大的功能提升。
Connecting our GenAISys to external online resources transitions our system from a controlled internal environment to the unpredictable realm of real-time data interactions. This transition, however, introduces critical security concerns. Opening a system without adequate protections can inadvertently expose sensitive data or cause security breaches, posing genuine risks both to users and the organization itself. As such, robust security measures are a prerequisite before fully integrating external services. Therefore, this chapter presents a threefold challenge: implementing the weather service using the OpenWeather API, building a moderation system leveraging OpenAI’s moderation capabilities, and developing a RAG-based data security function with Pinecone, which will detect and prevent sensitive-topic breaches. We will rely heavily on the flexible and powerful handler selection mechanism of our existing GenAISys architecture to seamlessly integrate these new functionalities. Our objective remains clear—minimal code enhancements with maximum functional impact.
本章首先详细介绍这些附加组件——内容审核、数据安全和实时天气——如何融入 GenAISys 的整体架构。然后,我们将深入探讨如何使用 OpenAI 的内容审核接口构建内容审核功能。接下来,我们将使用 Pinecone 构建一个基于 RAG 算法的数据安全模块,以主动检测和过滤敏感或不当输入。通过将这两个安全层直接集成到 GenAISys 的处理程序选择机制中,我们确保了对恶意交互的全面防护。
The chapter begins by detailing how these additional components—moderation, data security, and real-time weather—fit into the overall architecture of the GenAISys. Then, we’ll dive under the surface to build the moderation function using OpenAI’s moderation endpoint. We will then construct a RAG-powered data security module, using Pinecone, to proactively detect and filter out sensitive or inappropriate inputs. By integrating these two security layers directly into the GenAISys’s handler selection mechanism, we ensure comprehensive protection against unwanted interactions.
在确保安全性的前提下,我们将实现 OpenWeather API。此次集成使我们能够获取实时天气信息,从而支持一系列以用户为中心的互动任务。最后,我们将通过实际的多模态、多用户场景来演示增强型 GenAISys 的功能,例如生成基于天气的活动推荐、为旅游商品制作定制宣传图片,以及为旅行者动态创建个性化的天气感知信息。在本章结束时,您将完全掌握如何将我们在 GenAISys 中探讨的核心架构和概念应用于实际场景。应用程序,自信地提供全面、安全且功能强大的概念验证。
With security firmly in place, we will then implement the OpenWeather API. This integration allows us to retrieve live, real-time weather information to power a range of engaging, user-centric tasks. Finally, we will demonstrate the capabilities of the enhanced GenAISys through practical, multimodal, multi-user scenarios—such as generating weather-based activity recommendations, crafting customized promotional images for travel merchandise, and dynamically creating personalized weather-aware messages for travelers. By the end of this chapter, you’ll be fully equipped to adapt the core architecture and concepts we’ve explored in the GenAISys to real-world applications, confidently delivering a comprehensive, secure, and highly functional proof of concept.
本章涵盖以下主题:
This chapter covers the following topics:
首先,让我们清楚地规划一下新的审核、数据安全和天气功能是如何集成到我们的 GenAISys 中的。
Let’s start by clearly mapping out how the new moderation, data security, and weather functions integrate into our GenAISys.
将内容审核、数据安全和实时天气功能集成到我们的 GenAISys 系统中,将对所有三个方面产生影响。如图 9.1所示,架构层次。我们将依靠围绕处理程序选择机制构建的框架,使这种三层三功能的实现无缝衔接。
Integrating moderation, data security, and real-time weather functionalities into our GenAISys will affect all three architectural layers, as illustrated in Figure 9.1. We will rely on the framework built around our handler selection mechanism to make this three-level, three-function implementation seamless.
图 9.1:审核、数据安全和天气预报集成
Figure 9.1: Moderation, data security, and weather report integration
该实施将对所有三个层面产生如下影响:
The implementation will impact all three layers as follows:
图 9.2:安全性和审核检查直接在处理程序选择机制中管理。
Figure 9.2: Security and moderation checks are directly managed within the handler selection mechanism
该安全系统将主动拦截和评估每个用户输入,如果检测到不当或敏感内容,将阻止进一步处理并立即通过 IPython 界面向用户发出警报。
This security system will proactively intercept and evaluate each user input, blocking further processing and immediately alerting the user via the IPython interface if inappropriate or sensitive content is detected.
在构建和展示 GenAISys 作为灵活的概念验证时,请务必清晰地展示其可扩展性。重点突出其与更多 AI 功能、安全特性或其他生成模型集成的潜力。然而,务必谨慎:避免在没有具体项目需求的情况下过早添加功能,因为这可能会导致不必要的复杂性或项目过度开发。目标是在不将资源投入到不需要的开发工作中的情况下,清晰地展示项目的潜力。
While building and showcasing your GenAISys as a flexible proof of concept, remember to demonstrate its scalability clearly. Highlight its potential for integration with additional AI functions, further security features, or alternative generative models. However, exercise caution: avoid prematurely adding functionalities without concrete project requirements, as this can lead to unnecessary complexity or overdeveloping your project. The goal is to clearly show the project’s potential without investing resources in unrequested developments.
有了这样的战略思路,我们将首先构建安全职能。
With this strategic clarity in mind, we will begin by constructing the security function.
在此本节,我们将构建安全功能、审核功能和数据安全功能,如图所示如图 9.3所示:
In this section, we will build the security function, the moderation function, and the data security function, as illustrated in Figure 9.3:
图 9.3:处理程序选择机制直接调用安全函数
Figure 9.3: The handler selection mechanism directly calls the security function
我们将按如下方式实现这些功能:
We’ll implement these functions as follows:
我们先来构建安全功能及其调用。
Let’s first build the security function along with its calls.
安全功能直接如图 9.4所示,该功能与处理程序选择机制集成。它直接接收用户消息,早于处理程序注册表中的任何处理程序选择。如果用户消息违反了 GenAISys 的内容策略,安全功能会False向 IPython 接口返回一个标志(指示内容违规)。
The security function is directly integrated with the handler selection mechanism, as illustrated in Figure 9.4. It receives the user message directly, prior to any handler selection in the handler registry. If the user message violates the GenAISys content policy, the security function returns a False flag (indicating a content violation) to the IPython interface.
图 9.4:安全功能直接与处理程序选择机制相连。
Figure 9.4: The security function is directly connected to the handler selection mechanism
打开GitHub 上 Chapter09 目录下的“安全”GenAISys_Customer_Service.ipynb部分( https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main)。我们首先创建一个函数,将标志设置为 true ,表示内容不违反内容策略。此外,该函数还表明未检测到敏感主题,并且消息是安全的。securitymTruesecurityd=True符合数据安全规定:
Open GenAISys_Customer_Service.ipynb and go to the Security section within the Chapter09 directory on GitHub (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main). We first create the function that sets the securitym flag to True, signifying the content does not violate content policies. Also in this function, securityd=True shows that no sensitive topic has been detected and that the message is data security-compliant:
def security(user_message):
securitym=True # default value
securityd=True # default value
小贴士:使用AI 代码解释器和快速复制功能,提升您的编码体验。在新一代 Packt 阅读器中打开本书。点击“复制”按钮。
Quick tip: Enhance your coding experience with the AI Code Explainer and Quick Copy features. Open this book in the next-gen Packt Reader. Click the Copy button
(1)快速将代码复制到您的编码环境中,或单击“解释”按钮
(1) to quickly copy code into your coding environment, or click the Explain button
(2)让 AI 助手向您解释一段代码。
(2) to get the AI assistant to explain a block of code to you.
购买本书即可免费获赠新一代 Packt Reader 阅读器。扫描二维码或访问packtpub.com/unlock,然后使用搜索栏按书名查找本书。请仔细核对显示的版本,确保您购买的是正确的版本。
The next-gen Packt Reader is included for free with the purchase of this book. Scan the QR code OR visit packtpub.com/unlock, then use the search bar to find this book by name. Double-check the edition shown to make sure you get the right one.
该函数首先调用内容审核和内容可接受性函数:
The function begins by calling the moderation and content acceptability functions:
response=moderation(user_message)
# Moderation
security = is_acceptable(user_message,response)
#print(security) # Outputs: True if acceptable, False otherwise
moderation(user_message)调用 OpenAI 审核 APImoderation(user_message) invokes the OpenAI moderation APIis_acceptable(user_message, response)处理调节反应is_acceptable(user_message, response) processes the moderation responseprint(securitym)在测试或故障排除期间,可以取消注释调试行( )。print(securitym)) can be uncommented during testing or troubleshooting接下来,安全功能会调用数据安全功能,该功能会检查用户消息中是否存在敏感主题:
Next, the security function calls the data security function, which checks for sensitive topics within the user message:
# Data security
securityd = data_security(user_message)
#print securityd
该securityd变量将存储True或False,具体取决于消息是否符合数据安全标准。
The securityd variable will store either True or False, depending on whether the message meets the data security criteria.
函数的最后一部分会评估这两个标志(securitym和securityd),并据此返回安全状态:
The final part of the function evaluates both flags (securitym and securityd) and returns the security status accordingly:
if securitym==False or securityd==False:
return False
else:
return True
如果您不想启用审核或数据安全检查,可以注释掉安全函数中的相关代码行——具体来说,是 ` securitym=Trueand`赋值语句securityd=True和if securitym==False`or`securityd==False条件语句之间的代码。这样,该函数默认始终返回 `None` True。
If you prefer not to activate the moderation or data security checks, you can comment out the relevant lines in the security function—specifically, the code between the assignments of securitym=True and securityd=True, and the if securitym==False or securityd==False conditional statement. This way, the function defaults to always returning True.
现在让我们来研究一下处理程序选择机制如何与 IPython 接口交互。
Let’s now examine how the handler selection mechanism interacts with the IPython interface.
处理程序选择机制如图 9.5所示,IPython 接口与用户密切交互。当 IPython 接口向处理程序选择机制发送用户消息时,它会判断该消息是否符合安全策略。
The handler selection mechanism and IPython interface interact closely, as shown in Figure 9.5. When the IPython interface sends a user message to the handler selection mechanism, it determines whether the message complies with security policies.
图 9.5:处理程序选择机制和 IPython 交互
Figure 9.5: Handler selection mechanism and IPython interactions
处理程序选择机制函数的主要结构与前几章相比保持不变:
The primary structure of the handler selection mechanism function remains unchanged from previous chapters:
def chat_with_gpt(
messages, user_message, files_status, active_instruct, models
):
global memory_enabled # Ensure memory is used if set globally
然而,在本函数开始时,我们会调用安全函数。它会返回一个安全状态(True合规或False不合规):
However, at the start of this function, we now call the security function. It returns a security status (True for compliant or False for non-compliant):
try:
if not security(user_message):
return "Your message could not be processed as it may violate our security guidelines."d
如果消息被标记为不合规,用户界面将立即显示一条清晰的消息。我们将进行测试。稍后将在“运行安全检查”部分详细介绍安全功能示例。在此之前,我们先来实现审核功能。
If the message is flagged as non-compliant, a clear message will be immediately returned to the user interface. We will test security function examples thoroughly in the Running security checks section later. Before that, let’s move forward by implementing the moderation function.
我们将使用 OpenAI Omni,它具有综合的分类选项范围,如审核模型:https://platform.openai.com/docs/guides/moderation。
We will use OpenAI Omni, which has a comprehensive range of categorization options, as the moderation model: https://platform.openai.com/docs/guides/moderation.
打开GenAISys_Customer_Service.ipynb笔记本,导航至“安全”部分下的“审核”子部分。审核功能将由处理程序选择机制直接调用,如图9.6所示。它会对用户消息进行分类,并提供处理过程中标记的任何不当内容的详细信息。
Open the GenAISys_Customer_Service.ipynb notebook and navigate to the Moderation subsection within the Security section. The moderation function will be directly invoked by the handler selection mechanism, as illustrated in Figure 9.6. It classifies user messages and provides details about any inappropriate content flagged during processing.
图 9.6:调用调节函数
Figure 9.6: Calling the moderation function
我们将使用 OpenAI Omni 实现审核功能,具体步骤如下:
We will implement the moderation function with OpenAI Omni as follows:
from openai import OpenAI
client = OpenAI()
def moderation(user_message):
response = client.moderations.create(
model="omni-moderation-latest",
input=user_message,
)
return response
该函数处理用户消息并返回审核响应。安全功能收到响应后,会将响应发送给另一个函数,is_acceptable以评估消息是否可以接受:
This function processes the user message and returns a moderation response. Once received by the security function, the response is sent to another function, is_acceptable, to evaluate whether the message is acceptable or not:
def is_acceptable(user_message, response):
# Extract the 'flagged' status from the first result
flagged = response.results[0].flagged
响应中包含response.results[0].flagged一个值,用于指示True内容是否被标记,或False指示响应是否可接受的状态。如果内容被标记,响应的详细信息将保存在一个带有时间戳的 JSON 文件中:
The response contains response.results[0].flagged, which provides a True value if the content is flagged or a False status if the response is acceptable. If the content is flagged, the details of the response will be saved in a timestamped JSON file:
if flagged:
# Generate filename based on current date and time
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
sanitized_message = ''.join(
e for e in user_message if e.isalnum()
or e in (' ', '_')
).strip()
filename = \
f"{sanitized_message[:50].replace(' ', '_')}_{timestamp}.json"
# Ensure the 'logs' directory exists
os.makedirs('logs', exist_ok=True)
# Convert the response to a dictionary
response_dict = response.model_dump()
# Write the response to a JSON file in the 'logs' directory
with open(os.path.join('logs', filename), 'w') as file:
json.dump(response_dict, file, indent=4)
Finally, the function returns the flagged status as not_flagged:
# Return True if content is acceptable, False otherwise
return not_flagged
如果消息被标记,详细信息将保存在/logs子目录中一个带有时间戳的 JSON 文件中。生成的 JSON 文件包含一个唯一 ID、使用的模型以及各种类别的状态:
If the message is flagged, details are saved in a timestamped JSON file located within the /logs subdirectory. The resulting JSON file contains a unique ID, the model used, and the status of a wide range of categories:
{
"id": "modr-bb021ae067c296c1985fca7ccfd9ccf9",
"model": "omni-moderation-latest",
"results": [
{
"categories": {
"harassment": true,
"harassment_threatening": false,
"hate": false,
"hate_threatening": false,
"illicit": false,
"illicit_violent": false,
"self_harm": false,
"self_harm_instructions": false,
"self_harm_intent": false,
"sexual": false,
"sexual_minors": false,
"violence": false,
"violence_graphic": false,
"harassment/threatening": false,
"hate/threatening": false,
"illicit/violent": false,
"self-harm/intent": false,
"self-harm/instructions": false,
"self-harm": false,
"sexual/minors": false,
"violence/graphic": false
},
"category_applied_input_types": {
"harassment": [
"text"
],
"harassment_threatening": [
"text"
],…
例如,在这种情况下,harassment类别已被标记。该文件还包含每个类别的分数,如以下摘自该文件:
In this case, the harassment category has been flagged, for example. The file also contains a score for each category, as shown in this excerpt from the file:
"category_scores": {
"harassment": 0.8075929522141405,
…
},
"flagged": true
…
要执行快速质量控制检查或手动评估特定用户消息,请取消注释并在笔记本中使用以下代码行:
To perform a quick quality control check or evaluate specific user messages manually, uncomment and use the following lines in your notebook:
# Uncomment to use as security user message evaluation
user_message="Your ideas are always foolish and contribute nothing to our discussions."
security(user_message)
包含该信息的文件可以根据项目需要使用其他函数进行进一步处理。我们可以添加我们可以查看用户对文件的引用。此外,我们还可以查看对话详情,在笔记本的“加载并显示对话历史记录”部分查找哪个用户输入了标记的消息。稍后我们将在“运行安全检查”部分详细介绍这些功能。
The file containing the information can be processed further with other functions as required for your project. We could add the user to the file. We can also view the details of the dialogue to find which user entered the flagged message in the Load and display the conversation history section of the notebook. We will go through these features in the Running security checks section later.
现在我们来构建数据安全功能。
Let’s now build the data security function.
接下来,我们将构建一个数据安全功能,用于检测用户消息是否包含敏感内容。该功能涵盖广泛的安全相关问题,从保护机密信息到防止通过 GenAISys 界面进行不当或敏感的讨论。数据安全功能如图 9.7所示,它由两个组件构成:第一个组件用敏感主题填充 Pinecone 索引,第二个组件查询该索引以检测用户消息中的敏感主题。
We will now build a data security function designed to detect whether a user message contains sensitive topics. This function covers a wide spectrum of security-related concerns, from safeguarding confidential information to preventing inappropriate or sensitive discussions through the GenAISys interface. The data security function comprises two components, as shown in Figure 9.7: the first component populates a Pinecone index with sensitive topics, and the second component queries this index to detect sensitive topics within user messages.
图 9.7:数据安全功能的两个组成部分
Figure 9.7: The two components of the data security function
我们首先会在 Pinecone 索引中添加敏感话题。打开Pinecone_Security.ipynb此笔记本。此笔记本的结构与第三章 中的笔记本(Chapter03/Pinecone_instruction_scenarios.ipynb)类似。我们只需对其进行调整以适应敏感话题即可。Pinecone_instruction_scenarios.ipynb如有需要,您可以随时返回查看详细信息。
We will first populate the Pinecone index with sensitive topics. Open Pinecone_Security.ipynb. This notebook mirrors the structure of the earlier notebook from Chapter 3 (Chapter03/Pinecone_instruction_scenarios.ipynb). We will simply adapt it here to accommodate sensitive topics. Feel free to revisit Pinecone_instruction_scenarios.ipynb for a detailed refresher if needed.
我们将简要介绍代码。专门针对该领域改编的章节。第一步是下载我们的sensitive_topics数据集:
We’ll briefly cover the code sections specifically adapted for this domain. The first step involves downloading our sensitive_topics dataset:
download("Chapter09","sensitive_topics.csv")
该文件sensitive_topics.csv包含 100 个敏感主题,结构如下:
The file, sensitive_topics.csv, contains 100 sensitive topics structured as follows:
id:一个介于1000到 之间的唯一标识符1099id: A unique identifier ranging from 1000 to 1099values:敏感话题的文字描述values: Text descriptions of sensitive topics例如,第一个主题如下:
For example, the first topic is as follows:
Discussing the mismanagement of client funds that has resulted in significant financial discrepancies which have raised concerns among regulatory bodies and could lead to potential legal actions being taken against the travel agency if corrective measures are not implemented promptly and with strict accountability this issue demands immediate thorough review.
例如,如果用户消息包含与此主题类似的内容,则会被标记。基于向量的相似性搜索比传统方法提供更细致的检测。
If a user message contains content similar to this topic, for example, it will be flagged. Vector-based similarity searches provide more nuanced detection than traditional methods.
下一步是将sensitive_topics.csv文件中的数据分块:
The next step is to chunk the data from the sensitive_topics.csv file:
import time
start_time = time.time() # Start timing
# File path
file_path = 'sensitive_topics.csv'
# Read the file, skip the header, and clean the lines
chunks = []
…
然后,我们在 Pinecone 中创建一个命名空间来保存这些敏感数据。我们将这个命名空间命名为security:
We then create a namespace within Pinecone to hold this sensitive data. We will call this namespace security:
from pinecone import ServerlessSpec
index_name = 'genai-v1'
namespace="security"
cloud = os.environ.get('PINECONE_CLOUD') or 'aws'
region = os.environ.get('PINECONE_REGION') or 'us-east-1'
spec = ServerlessSpec(cloud=cloud, region=region)
upsert 函数保持不变,保留与.csv文件中相同的列名:
The upserting function remains unchanged, retaining the same column names as in the .csv file:
def upsert_to_pinecone(batch, batch_size, namespace="security"):
…
现在有了这些数据在 Pinecone 中填充完成后,我们将继续实现查询组件。
With the data now populated in Pinecone, we’ll proceed to implement the query component.
打开GenAISys_Customer_Service.ipynb并导航至“安全”部分的“数据安全”子部分。此处的目的是确定用户消息是否与 Pinecone 中存储的任何敏感主题匹配。安全索引。如果找到匹配项,则该消息将被标记。
Open GenAISys_Customer_Service.ipynb and navigate to the Data security subsection in the Security section. The goal here is to determine whether a user message matches any sensitive topics stored in the Pinecone security index. If a match is found, the message will be flagged.
最初,我们设置了默认安全指标:
Initially, we set default security indicators:
# Define data security function
import datetime
def data_security(user_message):
sec = True # Initialize to True (safe by default)
target_id = 0
score = None # Initialize score
security_level = 0.30 # Set your desired security threshold here (as a float, e.g., 0.0 to 1.0)
默认值sec设置为True,这意味着用户消息在找到其他安全记录之前都是安全的。此外,target_id设置为0,表示 Pinecone 索引中尚未找到任何记录。另外,我们还有一个安全级别阈值 ,security_level可以根据生产环境的需要进行设置。您可以在此处的代码中修改它,或者在与团队确定策略后创建一个安全接口。在本示例中,security_level设置为 ,0.30以避免过滤低级别安全项。
The default value, sec, is set to True, meaning that the user message is secure until found otherwise. Additionally, target_id is set to 0 to show that no record in the Pinecone index has been found yet. Additionally, we have a security level threshold, security_level, that can be set to the value deemed necessary in production. You can modify it here in the code or create a security interface once your strategy has been decided with your team. In this case, security_level is set to 0.30 to avoid filtering low-level security items in this educational example.
然后我们定义用于查询的命名空间:
We then define our namespace for querying:
namespace = "security"
print(namespace)
然后我们查询 Pinecone 索引以检测敏感主题:
We then query the Pinecone index to detect sensitive topics:
query_text = user_message
query_results = get_query_results(query_text, namespace)
现在我们提取得分,并将其用作安全级别阈值:
We now extract the score, which we will use as our security level threshold:
# Extract score directly from query_results BEFORE calling display_results
if (
query_results
and "matches" in query_results
and query_results["matches"]
):
score = query_results['matches'][0]['score']
print(f"Extracted Score: {score}") # Optional: print to verify
Now, we can display the results:
print("Processed query results:")
qtext, target_id = display_results(query_results)
print(qtext)
如果发现类似的敏感话题(得分超过阈值),我们将启动安全逻辑流程:
If a similar sensitive topic is found (indicated by a score that exceeds the threshold), we have a security logic process:
# --- LOGIC FOR SECURITY CHECK ---
# Determine 'sec' based on score first, if a score is available.
if score is not None:
if score > security_level:
sec = False # Breach detected: Score is above threshold
print(f"Security flag triggered: Score ({score}) exceeds threshold ({security_level}).")
else:
sec = True # No breach: Score is below or equal to threshold
print(f"Score ({score}) is below or equal to threshold ({security_level}). Not a score-based breach.")
else:
# If no score is available (e.g., no match found), then use target_id as a fallback.
if target_id is not None and int(target_id) > 0:
sec = False # Breach detected: Target ID is positive (fallback)
print(f"Security flag triggered: Target ID ({target_id}) is greater than 0 (fallback check).")
else:
sec = True # No breach detected by target_id fallback
# --- END SECURITY CHECK LOGIC -
查询结果会被跟踪,详细信息会被记录:
The query results are tracked and the details are recorded:
# Create a filename with the prefix "security_breach" and current datetime (format: YYYYMMDD_HHMMSS)
if not sec: # Only log if a security breach is detected
now = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"security_breach_{now}.txt"
# Include score and threshold in the log for context
log_content = f"Security Breach Detected!\nUser Message: {user_message}\nMatched Text: {qtext}\nTarget ID: {target_id}\nScore: {score}\nThreshold: {security_level}"
with open(filename, "w") as file:
file.write(log_content)
print(f"Security breach logged to {filename}")
else:
print("No security breach detected.")
结果True(非敏感信息或False敏感信息)将返回给安全函数:
The result, True (non-sensitive) or False (sensitive), is returned to the security function:
return sec
To manually evaluate messages, you can uncomment the following code snippet:
#Uncomment for security message evaluations
'''
user_message="Some client funds were mismanaged"
response=data_security(user_message)
print(response)
'''
现在让我们运行安全检查,以进行审核和敏感数据分类。
Let’s now run security checks for moderation and sensitive data classification.
我们将进行两项安全评估:一项针对内容审核(骚扰),另一项针对敏感数据。开始吧。首先向审核用户发送消息。
We’ll perform two security evaluations: one for moderation (harassment) and another for sensitive data. Let’s first begin with a moderation user message.
考虑一条包含骚扰内容的用户消息,如下所示:
Consider a user message that contains harassment content, as shown here:
图 9.8:骚扰信息
Figure 9.8: A harassing message
审核功能会检测不当内容:
The moderation function detects the inappropriate content:
User01: Your ideas are always foolish and contribute nothing to our discussions
Agent: Your message could not be processed as it may violate our security guidelines.
在目录中创建一个包含标记消息的 JSON 文件/logs:
A JSON file containing the flagged message is created in the /logs directory:
/content/logs/Your_ideas_are_always_foolish_and_contribute_nothi_20250306213022.json
该文件包含审核类别,并将用户消息标记为harassment:
The file contains the moderation categories and has flagged the user message as harassment:
{
"id": "modr-5ef1eeaaa3256a7c45feaa17396f11fd",
"model": "omni-moderation-latest",
"results": [
{
"categories": {
"harassment": true,
概率值也记录下来:
The probability is also logged:
…
"category_scores": {
"harassment": 0.8054343118110135,
我们现在有了数据安全功能。当我们对敏感主题(例如提及管理不善的客户资金)进行安全检查时,输出结果会确认检测到了敏感主题:
We now have a data security function. When we run a security check for a sensitive topic, such as referring to mismanaged client funds, the output confirms the detection of a sensitive topic:
User01: Some client funds were mismanaged and used in a case of intellectual property violations.
Agent: Your message could not be processed as it may violate our security guidelines.
系统会创建一个包含被标记消息的文件:
A file containing the flagged message is created :
/content/security_breach_20250306_213240.txt
This file contains a record of the flagged message in the Pinecone index:
1063, Discussing disputes over intellectual property in travel brochure designs which have led to accusations of copyright infringement potential lawsuits and significant reputational damage to the travel agency thereby necessitating an immediate review of all creative materials and strict adherence to intellectual property laws with immediate legal consultation and review now.
审核和敏感数据检查都会生成详细记录。随着项目需求的演变,这些日志可以进一步添加用户配置文件或其他数据。此外,GenAISys 还会存储完整的对话历史记录(conversation_history.json)和摘要,以供将来参考和审计。当我们输入以下内容时,对话历史记录将被激活exit:
Both moderation and sensitive data checks have created detailed records. These logs can be further expanded with user profiles or additional data as your project needs evolve. Additionally, the GenAISys stores full dialogue histories (conversation_history.json) and summarizations for future reference and auditing. The conversation history is activated when we enter exit:
Conversation ended. Saving history...
Conversation history saved to conversation_history.json.
History saved. Proceed to the next cell.
加载并显示对话历史记录的输出包含对话日志,其中包含用户名和安全标志:
The Load and display the conversation history outputs contain the log of the conversation with the usernames and security flags:
User01:..
Your ideas are always foolish and contribute nothing to our discussions…
assistant…
Your message could not be processed as it may violate our security guidelines.
…
Some client funds were mismanaged and used in a case of intellectual property violations.
assistant…
Your message could not be processed as it may violate our security guidelines….
对话的原始日志会被保存/content/conversation_history.json以供后续使用。“加载并汇总对话历史记录”部分包含对话摘要,其中包括用户名、用户消息和助手回复:
The raw log of the conversation is saved in /content/conversation_history.json for further use. The Load and summarize the conversation history section contains a summary of the dialogue, which includes the username, user messages, and assistant’s responses:
List of Actions:
1. User01's First Message:
- Action: User01 criticized the quality of ideas in discussions.
- Assistant's Response: The message was blocked due to potential security guideline violations.
2. User01's Second Message:
- Action: User01 reported an issue regarding the mismanagement of client funds.
- Assistant's Response: The message was again blocked for potential security guideline violations.
有了这些核心功能,我们系统的审核和安全措施就可以轻松调整或适应。我们已扩展系统以满足您的项目特定需求。现在,我们已准备好通过集成外部天气预报组件来推进项目。
With these core functionalities in place, our system’s moderation and security measures can be easily adapted or expanded to meet your project-specific requirements. We’re now ready to move forward by integrating an external weather forecast component.
在本节中,我们将通过集成天气预报库并构建一个专用系统,谨慎地开放 GenAISys。利用外部 API 实现天气预报功能。此集成使我们的 GenAISys 能够直接与实时天气信息交互,从而为与外部网络服务的交互提供更可控的途径。成功实现此功能后,GenAISys 即可获得授权,根据需要与其他网站进行交互。
In this section, we’ll cautiously open up the GenAISys by integrating a weather forecast library and building a dedicated weather forecast function using an external API. This integration allows our GenAISys to interact directly with real-time weather information, providing a pathway to more controlled interactions with external web services. After successfully implementing this functionality, the GenAISys could be authorized to interact with other websites as needed.
我们将针对特定地点(在本例中为城市)实施实时天气预报,以支持市场营销、生产计划、交付和客户服务,正如即将推出的“运行 GenAISys”部分中所述。
We’ll implement real-time weather forecasts for specific locations (cities in this case) to support marketing, production planning, deliveries, and customer service, as illustrated in the upcoming Running the GenAISys section.
打开GenAISys_Customer_Service.ipynb笔记本。该笔记本使用了 OpenWeather API,可从https://home.openweathermap.org/获取。OpenWeather 提供广泛的功能。虽然有很多天气预报服务,但我们将特别关注适合我们用例的实时预报。
Open the GenAISys_Customer_Service.ipynb notebook. The notebook uses the OpenWeather API, available at https://home.openweathermap.org/. OpenWeather provides a wide range of weather forecasting services, but we will focus specifically on real-time forecasts suitable for our use case.
要使用 OpenWeather API,请注册账号、获取 API 密钥,并仔细查看其定价方案(网址:https://openweathermap.org/price)。截至撰写本文时,我们示例所需的 API 调用在其免费套餐范围内可用,但存在请求次数限制。请在继续操作前确认费用和限制。
To use the OpenWeather API, sign up for an account, obtain your API key, and carefully review their pricing plans at https://openweathermap.org/price. At the time of writing, the API calls required for our examples are available under their free tier, subject to request limits. Please confirm the cost and limits before proceeding.
我们将使用处理程序选择机制,将我们的天气预报功能无缝集成到 GenAISys 框架中,如图9.9所示:
We will seamlessly integrate our weather forecast function into the GenAISys framework using the handler selection mechanism, as depicted in Figure 9.9:
图 9.9:在 GenAISys 框架中集成天气预报 API
Figure 9.9: Integrating a weather forecast API in the GenAISys framework
一旦我们设置好 OpenWeather 环境,我们将把天气预报功能与处理程序无缝集成。选择机制框架:
Once we set the OpenWeather environment up, we will integrate the weather forecast function seamlessly with the handler selection mechanism framework:
我们先来搭建 OpenWeather 环境。
Let’s first set up the OpenWeather environment.
在设置环境部分下GenAISys_Customer_Service.ipynb,转到“天气”子部分。
In GenAISys_Customer_Service.ipynb, go to the Weather subsection under the Setting up the environment section.
首先,下载用于检索您的数据的脚本来自 Google Secrets 的 OpenWeather API 密钥:
First, download the script for retrieving your OpenWeather API key from Google Secrets:
download("commons","weather_setup.py")
然后,笔记本会运行 API 密钥初始化函数:
The notebook then runs the API key initialization function:
google_secrets=True
if google_secrets==True:
import weather_setup
weather_setup.initialize_weather_api()
然后,使用以下函数初始化 API 密钥:
Then, initialize the API key using this function:
import requests
import os # Make sure to import os to access environment variables
# Fetch the API key from environment variables
api_key = os.environ.get('Weather_Key')
if not api_key:
raise ValueError("API Key is not set. Please check your initialization.")
您还可以使用其他方法设置 API 密钥,具体取决于您运行笔记本的环境。我们首先要再次确认密钥是否api_key已设置:
You can also set the API key with another method, depending on the environment you are running the notebook in. We begin by double-checking that api_key is set:
def weather_location(city_name):
# Fetch the API key from environment variables
api_key = os.environ.get('Weather_Key')
这种双重检查并非强制性的;它只是为了确保代码在会话受到例如微小的网络中断等情况下仍然健壮。如果您的环境稳定,可以随意移除此冗余检查。
This double-check is not obligatory; it just ensures that the code is robust if the session is interrupted by micro web interruptions, for example. Feel free to remove the redundant check if your environment is stable.
OpenWeather 的调用是非常简单。只需要您的 API 密钥和城市名称:
The OpenWeather call is remarkably simple. It requires only your API key and the city name:
# OpenWeatherMap API URL for city name
url = f"https://api.openweathermap.org/data/2.5/weather?q={city_name}&units=metric&appid={api_key}"
请注意,`°C`metric是 OpenWeatherMap API 定义的一个特定关键字,表示摄氏度( °C ),同时还提供以米/秒( m/s ) 为单位的风速。如果您需要美制单位,请将 `metric` 更改为 `imperial`,&units=imperial即可获得华氏度( °F ) 和华氏度 ( °F )。风速,单位为英里/小时(mph)。
Note that metric is a specific keyword defined by the OpenWeatherMap API, meaning degrees Celsius (C), and also provides the wind speed in meters per second (m/s). If you want US customary units, change metric to imperial, &units=imperial, and you will obtain degrees Fahrenheit (°F) and windspeed in miles per hour (mph).
现在我们只需要发出请求并获取响应即可:
We now just have to make the request and retrieve the response:
# Fetch real-time weather data
response = requests.get(url)
weather_data = response.json()
现在我们将提取并返回我们用例所需的实时天气信息。我们将使用当前温度、简要天气描述和风速:
We will now extract and return the real-time weather information we need for our use case. We will use the current temperature, a brief weather description, and the wind speed:
# Extract relevant data
current_temp = weather_data['main']['temp']
current_weather_desc = weather_data['weather'][0]['description']
wind_speed = weather_data['wind']['speed']
return current_temp, current_weather_desc, wind_speed
请注意,我们只提供了城市名称,没有提供国家名称。OpenWeatherMap API 具有智能系统,可以处理不同地点同名城市的请求。当搜索“Paris”时,它默认会选择最著名、最知名的地点,即法国巴黎。
Note that we only provided the name of the city, not the country. The OpenWeatherMap API has a smart system for handling requests for cities with the same name in different locations. When it searches for “Paris,” it defaults to the most prominent and well-known location, which is Paris, France.
当 API 收到仅包含城市名称的请求时q={city_name},它会使用内部算法来确定最可能的预期位置。该算法会优先考虑以下几个因素:
When the API receives a request with just a city name (q={city_name}), it uses an internal algorithm to determine the most likely intended location. This algorithm prioritizes several factors:
由于法国巴黎是全球重要的首都,人口和国际知名度都远高于德克萨斯州的巴黎,因此 API 默认使用法国首都。
Because Paris, France, is a major global capital with a significantly larger population and international recognition than Paris, Texas, the API defaults to the French capital.
由于我们正在建造一个以巴黎等主要旅游景点为例,该算法默认使用法国巴黎。如果生产环境中需要,您可以修改函数以包含国家/地区和国家/地区代码,如下例所示:
Since we are building an educational example using major tourist locations, such as Paris, the algorithm easily defaults to Paris, France. If needed in production, you could modify the function to include the country and country code, as in the following example:
def weather_location(city_name, state_code="", country_code=""):
…
query = city_name if state_code: query += f",{state_code}"
if country_code: query += f",{country_code}"
我们返回的信息足以让生成式人工智能模型解读实时天气预报并据此做出决策。因此,我们现在可以将天气预报选项添加到 IPython 接口中。
The information that we are returning is sufficient for a generative AI model to interpret the real-time weather forecast and make decisions based on it. With that, we are now ready to add a weather forecast option to the IPython interface.
现在,我们将向指令下拉列表中添加“天气”选项,如图9.10所示。为什么要添加天气预报选项呢?理由列表?其动机源于我们正在构建 GenAISys 的概念验证。我们可以基于将在 GenAISys 中运行的用例,设想几个 CoT 场景。然而,如果我们在与最终用户进行研讨会之前就编写这些流程,他们可能会觉得系统过于僵化。
We will now add a Weather option to the instruction drop-down list as shown in Figure 9.10. Why add a weather forecast option to a reasoning list? The motivation comes from the fact that we are building a GenAISys proof of concept. We could imagine several CoT scenarios based on the use cases we will run in our GenAISys. However, if we write these pipelines before having workshops with the end users, they might find the system too rigid.
图 9.10:向 IPython 接口添加天气预报选项
Figure 9.10: Add a weather forecast option to the IPython interface
最佳方案是设置一些 CoT 和流水线场景来演示 GenAISys 的功能,但要留有灵活性,直到用户建议我们自动化某些 CoT 场景为止。在运行 GenAISys 时执行。
The best approach is to have some CoT and pipeline scenarios to demonstrate the capabilities of the GenAISys, but leave room for flexibility until the users suggest that we automate some of the CoT scenarios they performed while running the GenAISys.
因此,我们将向 IPython 接口添加该选项,至于instruct_selector其使用方式,则留待讨论:
We will thus add the option to the IPython interface, in instruct_selector, leaving the way it is used open to discussion:
# Dropdown for reasoning type
instruct_selector = Dropdown(
options=["None", "Analysis", "Generation","Mobility","Weather"],
value="None",
description='Reasoning:',
layout=Layout(width='50%')
)
在运行天气预报时,用户选择“天气”并直接输入城市名称。虽然添加城市选择器或位置自动完成功能可以改善用户体验,但简单的文本输入框对此来说更为实用。实时应用场景,尤其适合游客,因为游客通常都知道目的地的确切名称。
When running forecasts, users select Weather and simply enter a city name. Although adding a city selector or location autocomplete could improve the user experience, a simple text input is more practical for this real-time use case, especially since tourists typically know their exact destination names.
这就是 IPython 接口第一层所需的一切。处理程序选择机制保持不变,因此我们直接进入处理程序注册表部分。
That is all we need at Layer 1 in the IPython interface. The handler selection mechanism remains unchanged, so we move directly on to the handler registry.
我们注册表中的天气处理器"Weather"在这个 OpenWeather 实现中,只需要用户消息中提供一个单词作为位置信息,以及相应的指令即可:
The weather handler in our registry only requires a one-word location in the user message and the "Weather" instruction in this implementation of OpenWeather:
# Weather handler: determined by the instruct flag
(
lambda msg, instruct, mem, models, user_message,
**kwargs: instruct == "Weather",
lambda msg, instruct, mem, models, user_message,
**kwargs: handle_weather(
user_message, models=models)
),
为什么不使用关键词呢?用"weather"关键词代替指令可能会被误认为是后续问题。在以下上下文中,具体如下:
Why not use keywords? Using "weather" as a keyword instead of an instruction could be confused with a follow-up question in the following context, as follows:
What is the weather in Paris?助手会回答:20°C, clear skies, 5m/s wind speed。What is the weather in Paris? The assistant would answer: 20°C, clear skies, 5m/s wind speed.What can I visit in this weather?在这种情况下,weather可能会触发一个 API 天气调用,但由于没有提供位置信息,该调用将会失败。What can I visit in this weather? In this case, weather could trigger an API weather call that would fail because no location is provided.截至撰写本章时,即使是 ChatGPT Plus 在提交请求之前也提供了以下选项:
At the time of writing this chapter, even ChatGPT Plus has options before submitting a request, as follows:
这些多样化的交互选择使界面更加灵活。然而,随着我们对用户界面的不断改进,我们将看到许多选项逐步实现自动化,同时还会出现新的生成式人工智能模型,这些模型融合了多个先前模型的功能。这是一个持续加速的生成式人工智能演进过程!
These multiple interactive choices make the interface flexible. However, as we work on user interfaces, we will see a progressive automation of many of these options, along with new generative AI models that encompass the functionality of several former models. It’s an ongoing accelerated generative AI evolution!
现在我们将天气函数添加到人工智能函数库中。
We will now add the weather function to the AI functions library.
实时天气预报功能首先会检查其api_key在会话期间是否仍然处于活动状态。此额外检查可确保……即使会话期间出现微小中断,API 调用仍保持稳定,不会断开连接。我们使用的虚拟机:
The real-time weather forecast function first checks whether api_key is still active during the session. This additional check ensures that the API call remains stable with no disconnection if there is a micro-interruption of the session in the VM we are using:
def handle_weather(user_message, **kwargs):
# Fetch the API key from environment variables
api_key = os.environ.get('Weather_Key')
if not api_key:
raise ValueError("API Key is not set. Please check your initialization.")
用户留言内容为城市名称;请求天气预报;返回结果包括温度、天气描述和风速:
The city name is the user message; the weather forecast will be requested; and the temperature, description, and wind speed will be returned:
city_name = user_message
current_temp, current_weather_desc, wind_speed = \
weather_location(city_name)
return f"Current Temperature in {city_name}: {current_temp}°C\nWeather: {current_weather_desc}\nWind Speed: {wind_speed} m/s"
我们可以在较长的用户消息中添加关键词搜索功能。
We could add a keyword search to a longer user message.
现阶段的限制是用户消息只能包含城市名称。但是,可以根据项目研讨会期间收集的用户反馈,以多种方式扩展此功能,例如:
At this stage, the constraint is that the user message must only contain the city’s name. However, the functionality can be expanded in various ways based on user feedback gathered during project workshops, such as the following:
限制:只需填写城市名称即可。
Limitation: Only the name of the city is necessary.
局限性:该列表可能包含数百个城市。
Limitation: The list could contain hundreds of cities.
局限性:顾客可能不知道所在地区或州的名称。此外,这种方法比直接输入城市名称要慢。
Limitation: A customer might not know the name of the region or state. Also, this takes longer than just entering the city.
局限性:查询结果可能指向其他城市,而不是客户所在的城市。
Limitation: The query might be for another city and not the one the customer is in.
局限性:我们可能需要增强城市关键词搜索功能,以防用户输入的词语超过城市名称。此外,用户也可能拼错地点名称。
Limitation: We might need to enhance the function with a city keyword search just in case the user enters more words than just the name of the city. Also, the customer might misspell the name of the location.
如您所见,在用户层面实现实时天气预报请求有多种方法。最佳方法是首先清晰地展示系统的基本功能,然后根据与客户研讨会收集的反馈意见调整和扩展界面。控制板。
As you can see, there are several ways to implement the real-time weather forecast request at the user level. The best approach is to initially showcase the basic capabilities of the system clearly and then adapt and extend the interface based on the feedback you receive from workshops with a customer panel.
现在我们已经构建了天气预报功能并将其集成到处理程序选择机制框架中,我们准备在 GenAISys 中演示实际用例。
Now that we’ve built and integrated the weather forecast functionality into the handler selection mechanism framework, we’re ready to demonstrate real-world use cases within the GenAISys.
在本节中,我们将使用 GenAISys 运行多用户、跨域、多模态交互,以实现实时旅游相关服务。通常,当游客访问一个城市时,他们会在一个应用程序中查看天气预报,用户需要在另一个平台上决定他们的活动。在这里,我们将这些领域融合为一个无缝体验。用户可以请求实时天气预报,GenAISys 会根据当前天气状况推荐合适的活动——室内、室外或两者兼有。这种集成式跨领域方法不仅可以应用于旅游业,还可以扩展到建筑规划、物流配送等领域,以及几乎所有传统上需要多个应用程序来支持用户决策的场景。
In this section, we will run multi-user, cross-domain, multimodal interactions using the GenAISys for real-time tourism-related services. Typically, when tourists visit a city, they check weather forecasts in one app and decide on their activities in another. Here, we’re merging these domains into one seamless experience. Users will request real-time weather forecasts, and the GenAISys will suggest suitable activities—indoor, outdoor, or both—based on the current weather conditions. This integrated, cross-domain approach can be extended beyond tourism into areas such as construction planning, delivery scheduling, and nearly any scenario where multiple applications are traditionally needed to support user decisions.
本节的主要目标是灵活地展示 GenAISys 的跨领域能力,说明该系统如何适应各种特定领域的应用场景。具体而言,我们将重点关注以下方面:
The main objective of this section is to demonstrate the cross-domain capabilities of the GenAISys in a flexible manner, illustrating how the system can adapt to various domain-specific scenarios. Specifically, we will focus on the following:
我们的目标是呈现清晰实用的概念验证,在用户研讨会最终确定自动化决策之前,重点展示潜在的工作流程。GenAISys 通过避免过早地自动化过多场景,保持了灵活性,使用户能够先自由地进行实验,并确定哪些功能应该自动化。请记住,引入的自动化越多,用户交互的灵活性就越低——不过,通过周密的规划,可以在自动化和交互性之间取得平衡。
We aim to present a clear and practical proof of concept, highlighting potential workflows before automation decisions are finalized in user workshops. By not prematurely automating too many scenarios, the GenAISys maintains flexibility, allowing actual users first to experiment freely and determine which features should be automated. Remember, the more automation you introduce, the less flexibility you retain in user interactions—though, with careful planning, it is possible to balance both automation and interactivity.
如前所述,有效的策略始于支持交互式手动工作流程,然后随着用户逐渐熟悉并提供反馈,逐步实现部分或全部任务的自动化。时间并非目标,用户成熟度才是。交互式场景可以根据用户的节奏过渡到全自动流程或CoT流程。通过让用户反馈指导开发,采用率自然会提高。
As emphasized earlier, an effective strategy begins with supporting interactive, manual workflows, then progressively automating some or all tasks as users gain familiarity and provide feedback. Time is not the objective; user maturity is. Interactive scenarios can transition into fully automated pipelines or CoT processes at the users’ pace. By letting user feedback shape development, adoption rates naturally improve.
生成式人工智能系统依赖于概率性响应,这意味着每次运行的输出结果可能略有不同。这种随机性确保了响应的动态性,而非僵化或重复。
Generative AI systems rely on probabilistic responses, meaning outputs can vary slightly with each run. This stochastic behavior ensures responses remain dynamic rather than rigid or repetitive.
在笔记本的开头(GenAISys_Customer_Service.ipynb),您可以选择激活或停用 DeepSeek 模型:
At the beginning of the notebook (GenAISys_Customer_Service.ipynb), you have the option to activate or deactivate the DeepSeek model:
# DeepSeek activation deepseek=True to activate. 20 Go (estimate) GPU memory and 30-40 Go Disk Space
deepseek=False
本章内容为deepseek:设置为False,我们将依靠 OpenAI 来运行我们的用例。
For this chapter, deepseek is set to False, and we will rely on OpenAI for running our use cases.
所有基于天气的决策所需功能都已准备就绪,现在让我们来探索 GenAISys 中的交互式用例。
With all necessary functions ready for weather-based decision-making, let’s now explore interactive use cases in the GenAISys.
我们将使用 GenAISys 运行一个交互式场景,模拟一位游客在特定城市寻求基于实时天气的推荐——无论是阴天可能下雨还是晴朗。因此,GenAISys 必须根据该特定地点的实时天气预报来调整其建议。
We’ll run an interactive scenario with the GenAISys, simulating a tourist seeking real-time, weather-based recommendations in a specific city—whether the weather is cloudy with possible rain or clear. The GenAISys must thus adapt its suggestions to real-time weather forecasts in that specific location.
本次对话将以多用户形式进行,参与用户包括以下几位:
The dialogue will be multi-user between the following users:
User01一位希望根据天气情况获得游玩建议的游客User01: A tourist who expects weather-dependent suggestions on what to doUser02提供支持的客户服务代理User02: A customer service agent who provides supportUser03负责协调市场营销、产品设计、生产、交付和客户服务的决策者User03: A decision-maker who coordinates marketing, product design, production, deliveries, and customer service对话中的每次交互都有清晰的标签,表明是哪个用户发起的请求。如果开始一个新的会话,会话名称会以“.”开头New Session。
Each interaction in this dialogue is clearly labeled, indicating the user initiating the request. If a new session starts, it will be prefixed by New Session.
界面操作如下:
The interface actions will be as follows:
为了扩展我们的 GenAISys 系统,我们可以添加许多增强功能、改进功能、附加功能、CoT 流程和管道。但是,这些都应该与客户/用户小组合作完成。
There are many enhancements, improvements, additional functions, CoT processes, and pipelines we could add to expand our GenAISys. However, this should be done in collaboration with a customer/user panel.
让我们从巴黎开始对话。
Let’s start the dialogue in Paris.
User01: Paris
客户激活了天气功能并输入了“巴黎”,他知道在这个概念验证中,应该按照以下说明操作:
The customer activated the Weather function and entered Paris, knowing that in this proof of concept, those are the instructions to follow, as shown here:
图 9.11:查询实时天气预报
Figure 9.11: Asking for a real-time weather forecast
GenAISys代理提供所需信息:
The GenAISys agent provides the requested information:
Agent: Current Temperature in Paris: 10.68°C
Weather: clear sky
Wind Speed: 3.09 m/s
请注意,天气响应将情况瞬息万变,因此,接下来的对话也会随之改变方向。不同的地点、不同的天气状况以及顾客不同的兴趣点都会将对话引向不同的方向。现在,顾客要求推荐:
Note that the weather response will continually change, and thus, the whole dialogue that follows will take a different turn. Different locations, different weather conditions, and different points of interest for the customer will take this dialogue in unique directions. The customer now asks for recommendations:
User01: What should I first visit with this weather
如果将“推理”选项设置为“无”,GenAI 将提供个性化建议:
With the Reasoning option set to None, the GenAI provides personalized suggestions:
图 9.12:询问在这种天气下可以进行哪些活动
Figure 9.12: Asking what activities to do in this weather
GenAI会根据天气情况推荐一些有趣的活动:
The GenAI suggests some engaging activities based on the weather:
Agent: With clear skies and cool temperatures in Paris, it's a great opportunity to explore outdoor attractions. Here are a few suggestions:
1. Eiffel Tower: Start your day with a visit to the iconic Eiffel Tower. You can enjoy the view from the ground or take an elevator ride to the top for a panoramic view of the city.
2. Champs-Élysées and Arc de Triomphe: Take a stroll down the famous Champs-Élysées and visit the Arc de Triomphe. The clear weather will make for great photos and a pleasant walk.
…
4. Seine River Cruise: Consider taking a Seine River cruise to see many of Paris's landmarks from the water. The clear sky will provide beautiful views of the city.
5. Luxembourg Gardens: Enjoy a leisurely walk in the Luxembourg Gardens. It's a perfect spot to relax and take in the beauty of the well-maintained gardens.
Make sure to dress warmly, as the temperature is quite cool. Enjoy your visit!
The user finds the response useful but would also like to visit a museum:
User01:But what if I want to only visit one outdoor site and then go to a museum?
无需更改界面配置,代理即可提供量身定制的行程:
Without changing the interface configuration, the agent provides a tailored itinerary:
1. Outdoor Site: Eiffel Tower: Start your day with a visit to the Eiffel Tower. It's an iconic symbol of Paris and offers stunning views of the city. You can choose to admire it from the ground or take a trip to the top.
2. Museum: Musée d'Orsay: After visiting the Eiffel Tower, head to the Musée d'Orsay. It's located along the Seine River and is housed in a beautiful Beaux-Arts railway station. The museum features an extensive collection of Impressionist and Post-Impressionist masterpieces by artists like Monet, Van Gogh, and Degas.
This combination allows you to enjoy both the outdoor beauty of Paris and its rich cultural heritage. Make sure to check the museum's opening hours and any ticket requirements in advance. Enjoy your day!
现在,客服人员介入,以增强客户的个性化体验:
Now, the customer service agent steps in to augment the personalized experience of the customer:
User02: Create an engaging image of the Eiffel tower and the Musée d'Orsay museum in Paris for a tourist
这种多用户界面可以轻松集成到网站中,并具备文本和图像生成、客户互动等功能。个人资料和对话记录——你的想象力是唯一的限制。
This kind of multi-user interface could easily be built into a website with features such as text and image generation, customer profiles, and conversation history—your imagination is the only limit.
在这种情况下,请确保选中“文件”以显示图像,如下所示:
In this case, ensure Files is checked to display the image, as shown here:
图 9.13:为访问创建自定义图像
Figure 9.13: Creating a custom image for the visit
GenAISys 生成了一幅引人入胜的图像,展示了推荐的个性化就诊方案,如下图所示:
The GenAISys generates a captivating image representing the recommended custom visit, as displayed here:
图 9.14:定制化访问的定制图像
Figure 9.14: A custom image for a customized visit
我们的 GenAISys 可以扩展为将图像发送到游客的智能手机,或者更好的是,可以将 GenAISys 开发成一个智能手机应用程序,连接到 GenAISys 服务器。
Our GenAISys could be expanded to send the image to the tourist’s smartphone, or even better, the GenAISys could be developed as a smartphone app connecting with the GenAISys server.
The image is great and the weather fine. Pleased, User02 requests a promotional offer:
User02: Create an offer with a 10% discount for a Paris Seine River Cruise
“文件”选项未选中,以便 GenAI 可以生成基于文本的宣传信息,如下所示:
Files is unchecked so the GenAI can craft a text-based promotional message, as shown here:
图 9.15:定制化访问的定制图像
Figure 9.15: A custom image for a customized visit
该模型生成了一条引人入胜的信息:
The model generates an engaging message:
Agent:Exclusive Offer: 10% Discount on Paris Seine River Cruise!
Experience the magic of Paris from the water with our exclusive Seine River Cruise. Glide past iconic landmarks like the Eiffel Tower, Notre-Dame Cathedral, and the Louvre, all while enjoying the stunning views of the City of Light.
Offer Details:
- Discount: 10% off your Seine River Cruise ticket
- Promo Code: SEINE10
- Validity: Book by [insert date] for travel until [insert date]
Highlights:
- Breathtaking views of Paris's most famous attractions
- Relaxing and scenic cruise experience
- Perfect for couples, families, and solo travelers
河轮游服务已经很棒了,但现在User03又推出了一项前沿理念来提升客户参与度:
The river cruise offer is great, but now User03 steps in with a cutting-edge idea to enhance customer engagement:
User03: Create an image of a cool travel bag with all the outdoor locations a tourist can visit in Paris on a sunny day. Add the Eiffel Tower and the Musée du Louvre to make it complete.
Files is checked again so the agent can produce the requested image, as shown:
图 9.16:检查文件以显示图像
Figure 9.16: Checking files to display an image
最终呈现出图中所示的引人注目的设计:
The result is the eye-catching design displayed here:
图 9.17:定制旅行包图片
Figure 9.17: A customized travel bag image
经理立即将这份定制设计稿转发给一家拥有旅行包库存和专业印刷设备的本地印刷合作伙伴。旅行包将在半小时内完成印刷并送达客户入住的酒店——这项服务就像订购披萨一样便捷高效!这代表了实时客户服务的未来发展方向。
The manager forwards this custom design immediately to a local printing partner that maintains travel bag stock and specialized printing equipment. The travel bag will be printed and delivered to the customer’s hotel within half an hour—a service as convenient and timely as ordering a pizza! This represents the future of real-time customer service.
最后,经理要求向客户发送一封引人入胜的电子邮件:
For a final touch, the manager asks for an engaging email to be sent to the customer:
User03: Write an engaging email for the tourist that will receive this custom-printed travel bag at the hotel desk within 30 minutes!
消息可以是应用程序中的通知、短信和/或在线对话的一部分。尖端、多用户、跨域、多模态网站!
The message could be a notification in an app, a text message, and/or part of the online conversation on a cutting-edge, multi-user, cross-domain, multimodal website!
在这种情况下,“文件”选项未选中,以便为 GenAI 生成文本留出空间,如下所示:
In this case, Files is unchecked to make room for the GenAI to generate the text, as shown here:
图 9.18:取消选中文件以发送消息
Figure 9.18: Unchecking files to send a message
如果经理愿意,文件可以保持选中状态,图像可以被处理并附加到邮件中。
If the manager wished, Files could have remained checked, and the image could have been processed to be attached to the message.
这条信息引人入胜,突显了这家旅行社提供的卓越客户服务水平:
The message is engaging and highlights the exceptional level of customer service offered by the travel agency:
Dear [Tourist's Name],
Bonjour and welcome to the enchanting city of Paris!
We are thrilled to inform you that your custom-printed travel bag, designed exclusively for your Parisian adventure, is ready and waiting for you at the hotel desk. This isn't just any travel bag; it's your personal guide to the most iconic outdoor locations in the City of Light, all beautifully illustrated to inspire your journey.
….
We hope this bag not only serves as a practical companion but also as a cherished keepsake of your time in Paris. Whether you're strolling along the Seine or enjoying a café au lait at a local bistro, let this bag be your guide to unforgettable memories.
Please visit the hotel desk within the next 30 minutes to collect your bag and start your Parisian adventure in style.
…
P.S. Don't forget to share your adventures with us on social media using #ParisWith[HotelName]! We can't wait to see where your travels take you.
在实时配送环境中,“PS”信息至关重要,因为它鼓励顾客在社交媒体上分享他们的积极体验。随着几乎所有行业的竞争日益激烈,这种用户生成内容有助于品牌扩大影响力并增强市场差异化优势。
The “P.S.” message is important in a real-time delivery context, as it encourages customers to share their positive experiences on social media. This kind of user-generated content helps brands extend their reach and strengthen their market differentiation as competition intensifies across nearly every industry.
现在我们换个地方,换个方法。进入Exit笔记本的单元格,重新运行会话。客服人员现在将探讨前往东京的旅客可选择的方案:
Now let’s move to another location, using a different approach. Restart the session by entering Exit and rerunning the notebook’s cell. The customer service agent will now explore options for customers visiting Tokyo:
New Session. User02: Tokyo
在本次会议中User02,客服代理和User03经理正在思考如何向抵达东京的顾客提供建议。他们希望定制顾客体验,以确保在竞争中保持领先地位。他们首先想知道当天的天气,于是选择了“天气” (如图 9.19所示)并输入了相关信息Tokyo。
In this session, User02, the customer service agent, and User03, the manager, are trying to figure out what to suggest to their customers who have arrived in Tokyo. They want to customize the experience to make sure they keep ahead of the competition. They first want to know the weather for that day, so they select Weather (as in Figure 9.19) and enter Tokyo.
图 9.19:未选中“文件”,已选中“天气”。
Figure 9.19: Files is unchecked, and Weather is selected
GenAISys实时返回天气信息:
The GenAISys returns the weather in real time:
Agent: Current Temperature in Tokyo: 9.51°C
Weather: moderate rain
Wind Speed: 10.8 m/s
现在,User02我想知道在这种天气下应该带些什么:
Now, User02 would like to know what to suggest for this weather:
User02: What places would you suggest a tourist visit in this weather?
天气功能已停用(见图9.11)。GenAI 必须分析天气,然后才能推荐活动。
The Weather function is deactivated (see Figure 9.11). The GenAI must analyze the weather and then suggest activities.
图 9.20:文件未选中,“推理”设置为“无”。
Figure 9.20: Files is unchecked, and Reasoning is set to None
代理人会考虑天气因素,并提出一些有趣的建议:
The agent takes the weather into account and makes interesting suggestions:
Agent: In moderate rain and cooler temperatures, it's a good idea to suggest indoor activities or places where tourists can enjoy themselves without being too affected by the weather. Here are some suggestions for a tourist visiting Tokyo in such conditions:
1. Museums and Art Galleries:
- Tokyo National Museum: Located in Ueno Park…
- Mori Art Museum: Situated in Roppongi Hills…
2. Shopping and Entertainment Complexes:
- Odaiba: This man-made island has shopping malls…
- Shibuya Scramble Square: Offers shopping, dining, …
3. Aquariums and Zoos:
- Sumida Aquarium: Located in…
- Ueno Zoo: While some parts…
…
我们可以看到,GenAISys 能够以可接受的方式适应不同的场景。
We can see that the GenAISys is adapting to different scenarios in an acceptable way.
现在,User02正在获得根据大家的回复,我有一些想法。既然下雨,为什么不建议大家参加巴士观光游呢?
Now, User02 is getting some ideas from the response. Since the weather is rainy, why not suggest a bus tour?
User02: Ok. Write an engaging email to offer a bus tour for some of these locations to visit in one day. Don't make it too dense. Make the bus tour nice and cool.
配置(“文件”未选中,“推理”设置为“无”)未更改。GenAISys 提供了一个颇具吸引力的巴士参观方案:
The configuration (Files unchecked, Reasoning set to None) does not change. The GenAISys makes an engaging offer for a bus tour:
Agent: Subject: Discover Tokyo's Hidden Gems with Our Exclusive Bus Tour!
Hi [Name],
Are you ready to explore Tokyo like never before? Join us for a one-of-a-kind bus tour that takes you to some of the city's most fascinating indoor attractions, perfect for a day of adventure, rain or shine!
Tour Highlights:
…
Our comfortable, climate-controlled bus ensures a relaxing journey between stops, with a friendly guide sharing insider tips and stories about each location. Plus, enjoy complimentary snacks and refreshments on board!
Tour Details:
- Date: [Insert Date]
- Time: [Insert Time]
- Pick-up Location: [Insert Location]
User03介入并要求 GenAISys 为食品和饮料撰写一条引人入胜的信息:
User03 steps in and asks the GenAISys to write an engaging message for food and beverages:
User03: Write an engaging email for the food and beverages for tourists taking a bus tour of Tokyo with some stops. Do not mention the stops at all or the locations. Just focus on the super discount prices for food and beverages. Make sure to have food with and without meat. No alcohol on the bus is allowed. Also, think of some gluten-free food.
配置(“文件”未选中,“原因”设置为“无”)未更改。GenAISys 在巴士游览期间提供诱人的餐饮服务。其目的是向客户展示旅行的乐趣。该机构对他们照顾得非常周到。
The configuration (Files unchecked, Reasoning set to None) does not change. The GenAISys makes an engaging offer for food and beverages during the bus tour. The idea is to show the customers that the travel agency is taking care of them in great detail.
GenAISys了解当前形势,并提供了引人入胜的信息:
The GenAISys understands the situation and provides an engaging message:
Agent: Subject: Delicious Deals Await on Your Tokyo Bus Tour!
Dear Traveler,
Get ready to embark on a delightful culinary journey as you explore Tokyo with us! We're thrilled to offer you an array of mouth-watering food and beverage options at unbeatable prices, ensuring your taste buds are as entertained as your eyes.
Savor the Flavors:
- Meaty Marvels: …
- Veggie Delights: …
- Gluten-Free Goodies:…
Thirst-Quenching Beverages: Stay refreshed with our selection of non-alcoholic beverages…
Exclusive Offer:…
…
P.S. Remember, no alcohol is allowed on the bus so that you can enjoy a safe and family-friendly environment throughout your adventure!
我们可以看出,GenAISys 具有灵活性,能够有效地利用生成式人工智能模型。
We can see that the GenAISys is flexible and leverages the generative AI models efficiently.
让我们在纽约结束这次旅行体验。退出会话并重新运行 IPython 单元格。
Let’s end our travelling journey experience in New York. Exit the session and rerun the IPython cell.
User02想要预测纽约的活动:
User02 wants to anticipate activities in New York:
New session. User02: New York
在本次会议中,User02(客服人员)和User03(经理)正在集思广益,为游客制定个性化推荐方案。抵达东京后,他们希望定制专属体验以保持竞争优势。首先,他们选择“天气”选项(图 9.21)并输入以下信息来查看当前天气Tokyo:
In this session, User02 (customer service agent) and User03 (manager) are brainstorming personalized recommendations for tourists arriving in Tokyo. They want to tailor the experience to stay ahead of competitors. First, they check the current weather by selecting the Weather option (Figure 9.21) and entering Tokyo:
图 9.21:文件未选中,天气已选中
Figure 9.21: Files is unchecked and Weather is selected
该代理提供实时天气信息:
The agent provides the weather in real time:
Agent: Current Temperature in New York: 10.06°C
Weather: mist
Wind Speed: 4.12 m/s
User02他认为雾天并非带游客游览中央公园的最佳时机。GenAISys 能提出什么建议呢?
User02 thinks that misty weather is not the best day to take customers on a tour of Central Park. What can the GenAISys come up with?
User02: I'm a tourist customer service agent. Considering the weather, I was thinking of suggesting an exciting activity to my tourist group. Find one activity for the morning that could be exciting.
User02运行 GenAISys 时,取消选中“文件”选项,并且不执行任何推理指令(图 9.22)。情况并非如此简单,因为我们希望客户能够访问新的地方,并可能给他们带来一些惊喜:
User02 runs the GenAISys with Files unchecked and no Reasoning instruction (Figure 9.22). The situation is not that easy because we want the customers to visit new places and possibly surprise them a bit:
图 9.22:文件未选中,天气未选中。
Figure 9.22: Files is unchecked and Weather is not selected
GenAISys 发现了一个有趣的项目,可以为旅程增添文化气息:
The GenAISys finds an interesting program that can add a cultural touch to the trip:
Agent: Considering the misty weather in New York, a great morning activity for your tourist group could be visiting the American Museum of Natural History. It's an exciting indoor experience where they can explore fascinating exhibits, including dinosaur fossils, the Hall of Ocean Life, and the Rose Center for Earth and Space. This activity will keep them engaged and comfortable regardless of the weather outside.
纽约之旅结束时,User03经理走上前去,向顾客们赠送纪念T恤:
At the end of the trip to New York, User03, the manager, steps in to offer the customers a souvenir T-shirt:
User03: The hotel was great for a tourist group in New York because the view reminded some of them of when they first came to New York with their parents. I'm sure they would have like to receive a custom T-shirt as a souvenir to remind them of that experience. Make sure that the image you produce shows a T-shirt.
User03决定启用完整的 GenAISys CoT 生成流程,以分析客户反馈并制作精美的 T 恤。文件已检查,生成流程已激活,如下所示:
User03 decides to activate a full-scale GenAISys CoT generation to analyze customer feedback and produce a nice T-shirt. Files is checked and Generation is activated, as shown:
图 9.23:已选中“文件”选项并选择“生成”选项。
Figure 9.23: Files is checked and Generation is selected
我们的 GenAISys 系统会逐步启动生成过程:
Our GenAISys begins the generation process step by step:
"The hotel was great for a tourist group in New York because the view reminded some of them of when they first came to New York with their parents.I'm sure they would have like to receive a custom T-shirt as a souvenir to remind them of that experience.Make sure that the image you produce shows a T-shirt."
它首先对导游检测到的反馈进行记忆和情感分析:
It first performs a memory and sentiment analysis of the feedback the tour guide detected:
Agent: Process: Performing memory and sentiment analysis.
Memory analysis result: Segment 1: "The hotel was great for a tourist group in New York because the view reminded some of them of when they first came to New York with their parents."
- Memory Tags: [Episodic Memory, Time Memory Past, Reality Memory]
- Dimension: Emotional
- Sentiment Score: 0.8
- Explanation: This segment references a real past event (their first visit to New York with their parents) that triggers personal, nostalgic recollections, placing it in episodic and time (past) memory while affirming its reality and positive emotional tone.
It then extracts the scores from the content it analyzed:
Process: Extracting scores from response.
Memory analysis result: Episodic Memory sentiment 0.8
Time Memory Past sentiment 0.8
Reality Memory sentiment 0.8
Episodic Memory sentiment 0.75
Time Memory Past sentiment 0.75
Memoryless sentiment 0.5
然后,它会进行统计分析,给出总体评分:
It then performs a statistical analysis to provide an overall rating:
Process: Statistical analysis
Extracted sentiment scores: [0.8, 0.8, 0.8, 0.75, 0.75, 0.5]
Extracted sentiment scores: 0.73
Scaled rating (0–5): 3.65
Process: Creating content
分数不是4分或5分,因为他们想要一件纪念品——一件T恤,但他们还没有!因此,GenAISys现在会生成一个图像生成提示:
The score is not 4 or 5 because they would like a souvenir, a T-shirt, and they don’t have it yet! So, the GenAISys now creates a prompt for image generation:
Prompt created for image generation: "Design a custom T-shirt featuring the iconic New York skyline or a memorable landmark like Times Square or Central Park, capturing the essence of a nostalgic and cherished tourist experience."
GenAISys 现在为提示符创建一个映像(图 9.24):
The GenAISys now creates an image for the prompt (Figure 9.24):
Process: Creating an image
Image created
图 9.24:一件定制 T 恤
Figure 9.24: A custom T-shirt
The GenAISys now creates a message for the customer:
Process: Creating a message.
Customer message: Dear Customer,
We invite you to design a custom T-shirt that beautifully showcases the iconic New York skyline or a memorable landmark such as Times Square or Central Park. This design aims to capture the essence of a nostalgic and cherished tourist experience.
Best regards,
为了这次纽约之行,经理将指示当地印刷厂把定制图案印到印刷厂已有的一批T恤上。T恤将在1小时内生产完毕并送到游客的酒店。
For the New York trip, the manager will now instruct the local printer to print the custom image onto a batch of T-shirts already stocked at the printer’s site. The shirts will be produced and delivered to the tourist’s hotel within the hour.
旅行社对 GenAISys 的概念验证非常满意,现在正准备组织所有团队成员参加研讨会,以将该系统提升到一个新的水平。
The travel agency is extremely satisfied with the GenAISys proof of concept and is now preparing to organize workshops with all team members to take the system to the next level.
现在让我们总结一下我们的旅程——并准备探索接下来的路!
Let’s now summarize our journey—and get ready to explore what’s next!
现代市场的加速发展使得等待时间不复存在。实时响应已成为默认设置。本章探讨的用例表明,一个可用于商业应用的 GenAISys 概念验证系统必须能够提供即时、真实的实际服务。我们实现了一个实时旅行社场景——一旦进入项目模式,该场景即可轻松应用于其他领域。
The acceleration of modern markets has led to the extinction of waiting times. Real-time responsiveness is now the default. The use case explored in this chapter demonstrates that a business-ready GenAISys proof of concept must be able to provide immediate, real-world services. We implemented a real-time travel agency scenario—one that can easily be adapted to other domains once you enter project mode.
为了使我们的概念验证能够向团队、客户或投资者进行现场演示,我们首先在 GenAISys 中集成了一个审核组件。我们采用了 OpenAI 的 Omni 模型来检测和标记多种内容策略违规行为。该审核功能嵌入在处理程序选择机制层,确保不当的用户消息在进行任何进一步处理之前就被标记并拒绝。为了进一步增强利益相关者的信心,我们添加了一个数据安全层。我们使用向量相似性搜索构建了一个函数,用于检测用户消息是否涉及敏感话题。我们创建了一个专用的 Pinecone 命名空间来存储这些话题,并针对每个传入的用户消息进行查询。如果找到匹配项,系统会标记内容、显示警告并拒绝该请求。
To make our proof of concept suitable for live demonstration to a team, customer, or investor, we first integrated a moderation component into GenAISys. We implemented OpenAI’s Omni model to detect and flag multiple categories of content policy violations. This moderation function is embedded at the handler selection mechanism level, ensuring that inappropriate user messages are flagged and rejected before any further processing. To further reassure stakeholders, we added a data security layer. Using vector similarity search, we built a function that detects whether a user message aligns with sensitive topics. We created a dedicated Pinecone namespace to store these topics and queried it against every incoming user message. If a match was found, the system flagged the content, displayed a warning, and rejected the request.
所有被标记的审核和安全违规行为均记录在单独的文件中,完整的对话历史记录可以追溯到具体用户。在这些必要的安全措施到位后,我们继续使用 OpenWeather 实现实时天气预报功能——该功能与 GenAISys 紧密集成,用于基于天气的决策。最后,我们演示了完整的 GenAISys 对话,展示了该系统支持实时、多用户、跨领域、多模态交互的能力——这有力地说明了生成式人工智能在生产环境中能够取得的成就。
All flagged moderation and security violations were logged in separate files, and the full conversation history can be traced to individual users. With these essential safeguards in place, we proceeded to implement a real-time weather forecasting feature using OpenWeather—tightly integrated into GenAISys for weather-aware decision-making. Finally, we walked through a complete GenAISys dialogue, showcasing the system’s ability to support real-time, multi-user, cross-domain, multimodal interactions—a powerful illustration of what generative AI can achieve in production settings.
下一步是什么?展示您的 GenAISys 概念验证,以获得扩展和升级系统所需的资源。但这样的演示必须精心准备——而这正是我们将在下一章重点讨论的内容。
The next step? Presenting your GenAISys proof of concept to secure the resources needed to expand and scale the system. But such a presentation must be carefully crafted—and that’s exactly what we’ll focus on in the next chapter.
立即解锁本书的专属权益Unlock this book’s exclusive benefits now扫描此二维码或访问packtpub.com/unlock,然后按书名搜索本书。 Scan this QR code or go to packtpub.com/unlock, then search for this book by name. |
|
|
注意:开始之前请准备好您的采购发票。 Note: Keep your purchase invoice ready before you start. |
仅仅打造一个创新的GenAISys并不能为我们带来客户、资金,甚至可能连认可都难以获得。在当今竞争异常激烈的市场环境中,成百上千种人工智能解决方案、智能体系统以及类似产品都在争夺市场份额。企业、投资者和终端用户面临着海量的选择——从谷歌云、微软Azure、亚马逊云服务(AWS)和IBM等行业巨头,到数量不断增长的初创企业和中型公司。我们该何去何从?又该如何应对?
Creating an innovative GenAISys alone will not get us customers, funding, or possibly even recognition. In today’s fiercely competitive landscape, hundreds, if not thousands, of AI solutions, agentic systems, and similar offerings are vying simultaneously for market attention. Businesses, investors, and end users face an overwhelming flood of options—from industry giants such as Google Cloud, Microsoft Azure, Amazon Web Services (AWS), and IBM to an ever-expanding array of start-ups and mid-sized companies. Where does that leave us, and what should we do?
要突破市场日益加固的壁垒,我们需要两样东西:实施和盈利能力。本书至今为止,我们已经构建了一个概念验证,证明我们有能力交付一个高效的GenAISys系统,该系统能够支持整个供应链的核心业务功能——从市场营销和生产到交付和客户服务。但现在,我们必须将技术成果转化为一个引人入胜的故事,以便有效地与利益相关者(包括内部团队、雇主、潜在客户和投资者)沟通并推销我们的理念。为了成功展示我们的GenAISys系统,我们需要一个精心设计的演示场景,突出我们现有的能力,并清晰地展现我们进一步创新和扩展的潜力。决定何时暂停开发并过渡到展示概念验证至关重要。
We need two things to penetrate the market’s thickening wall: implementation and profitability. In this book, up to now, we have built a proof of concept of our ability to deliver an effective GenAISys capable of supporting core business functions across the supply chain—from marketing and production to delivery and customer service. But now, we must turn our technical achievements into a compelling story to effectively communicate and sell our ideas to stakeholders—be it internal teams, employers, potential customers, or investors. To successfully present our GenAISys, we need a well-crafted demonstration scenario that highlights our existing capabilities and clearly conveys our capacity for further innovation and expansion. Deciding exactly when to pause development and transition into presenting the proof of concept is critical.
本章介绍的方法只是在竞争激烈的AI环境下成功展示GenAISys的众多途径之一。我们将直奔主题。鉴于当今AI受众的注意力持续时间有限,吸引他们的注意力将是一项挑战。因此,本章将重点介绍GenAISys演示的前七分钟。我们将从本书中开发的基于IPython的界面平滑过渡到一个灵活且易于定制的前端网页。这种策略使我们能够快速定制演示,以满足特定项目的需求,并得到我们GenAISys原型系统的可靠性支持。我们将系统地介绍关键组件:核心功能、向量存储集成、关键性能指标(KPI)、外部数据库连接以及安全和隐私的关键方面。此外,我们还将强调如何通过利益相关者研讨会协作定制系统。
In this chapter, the method described is only one of the many ways to achieve a successful presentation of a GenAISys amid relentless AI competition. We will focus on getting straight to the point. Given that the attention span of present-day AI-informed audiences is limited, capturing the attention of your audience will prove challenging. The chapter will thus focus on the first seven minutes of a presentation of our GenAISys. We will transition smoothly from the IPython-based interface developed throughout this book into a flexible, easily adaptable frontend web page. This strategy allows us to quickly customize our demonstration to address specific project requirements, supported by the credibility of our working GenAISys prototype. We will systematically present the key components: core functionality, vector store integration, essential key performance indicators (KPIs), external database connectivity, and critical aspects of security and privacy. Additionally, we will emphasize how the system can be collaboratively customized through stakeholder workshops.
最后,为了展现我们框架的极致灵活性和未来潜力,我们将介绍GenAISys的一个未来演进方向:基于群体智能的多智能体系统(MAS),它展示了一种强大的全新人机协作范式。在本章结束时,您将了解如何将GenAISys的现有功能与未来可添加到系统中的潜在演进相结合。
Finally, to showcase the ultimate flexibility and future potential of our framework, we will introduce a prospective evolution of our GenAISys: a swarm-based multi-agent system (MAS), demonstrating a powerful new paradigm of human-AI co-worker collaboration. By the end of this chapter, you will know how to combine the actual GenAISys features with the potential evolutions that can be added to the system.
本章涵盖以下主题:
This chapter covers the following topics:
让我们首先精心设计 GenAISys 的演示文稿。
Let’s start by carefully designing the presentation of our GenAISys.
设计本书中我们构建的教育型 GenAISys 的演示不仅仅局限于技术层面。无论您的受众是内部团队、投资者还是客户,GenAISys 的成功在很大程度上取决于能否清晰地传达其商业价值。
Designing the presentation for the educational GenAISys we’ve built throughout this book goes beyond the technical dimension. Whether your audience is an internal team, an investor, or a client, the success of your GenAISys will largely depend on clearly communicating its business value.
一个精彩的 GenAISys 演示必须以引人入胜的开场白开始,尤其是在最初几分钟的关键时间内,如图10.1所示:
A powerful GenAISys presentation must begin with a compelling introduction, especially during the first few critical minutes, as shown in Figure 10.1:
图 10.1:快速演示
Figure 10.1: A fast-track presentation
所示时间(以分钟为单位)灵活,可根据您的需求进行调整。然而,现代观众期望看到简洁、有力且高效的演示。通过清晰地展示您的专业知识和效率,保持观众的注意力。为方便起见,我们将所有观众——包括您的内部团队、投资者或客户——统称为“客户”。
The timing indicated (in minutes) is flexible and can be adjusted according to your needs. However, modern audiences expect concise, impactful, and effective presentations. Keep your audience engaged by clearly demonstrating expertise and efficiency. For simplicity, we’ll refer to any audience—your internal team, investors, or clients—as the customer.
The customer will expect the following timeline:
图 10.2:实时 KPI
Figure 10.2: Real-time KPI
通过精心构建通过这种方式进行演示,您展现了清晰的专业知识和灵活的实践能力。您已经为令人信服地展示您已准备好投入商业应用的 GenAISys 系统做好了充分的准备。
By carefully structuring your presentation this way, you demonstrate clear expertise and practical flexibility. You’ve set the stage to convincingly showcase your business-ready GenAISys.
接下来,让我们构建灵活的 Web 界面,为无缝引导您的受众使用完全集成且用户友好的 GenAISys 奠定基础。
Next, let’s build out the flexible web interface, preparing the foundation to seamlessly guide your audience through your fully integrated and user-friendly GenAISys.
这演示网页界面需要灵活且易于调整。在某些情况下,我们甚至可能需要在咖啡休息时间当场修改代码!客户可能会坚持添加徽标、更改字体或颜色。如果我们拒绝,这可能会成为绊脚石,影响整个演示效果!在这个客户日益熟悉生成式人工智能的新时代,我们需要在适应性和稳定性之间找到平衡点。因此,在本节中,我们将首先构建一个包含 GenAISys 核心功能的界面,并充分考虑其灵活性。您可以将网页在 HTML 编辑器中打开,如果客户坚持要立即查看修改,您可以实时刷新页面。请记住,每个人都可以使用人工智能助手,并且可以实时超越我们。我们需要比竞争对手更快,而竞争对手也可能是最终用户。
The demonstration web interface needs to be flexible and adaptable. In some cases, we might even have to adapt the code on the spot during a coffee break! A customer might insist on adding a logo, changing the font, or changing the color. If we resist, this may become a stumbling block that casts a shadow over the whole presentation! We need to walk a fine line between adaptability and stability in this new era of generative AI-aware customers. Hence, in this section, we will first build an interface with GenAISys’s core functionality, keeping flexibility in mind. You can have the web page open in your HTML editor and refresh it in real time if the customer insists on seeing a modification immediately. Remember, everybody has access to AI copilots and can overtake us in real time. We need to be faster than the competition, which could also be an end user.
目标是证明 IPython 概念验证接口独立于底层 AI 编排功能。因此,它可以适应项目所需的任何环境。在本例中,我们选择展示专用服务器上的网页外观。首先,请打开GenAISys_Presentation.ipynbGitHub 上的 Chapter10 目录(https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main)。
The goal is to show that the IPython proof-of-concept interface is independent of the underlying AI orchestration functions. Thus, it can be adapted to any environment necessary for a project. In this case, we are choosing to show what a web page on a dedicated server would look like. Let’s get started by opening GenAISys_Presentation.ipynb within the Chapter10 directory on GitHub (https://github.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/tree/main).
首先,我们下载演示所需的辅助脚本和网页:
First, we download the helper scripts and web pages required for the demonstration:
!curl -L https://raw.githubusercontent.com/Denis2054/Building-Business-Ready-Generative-AI-Systems/master/commons/grequests.py --output grequests.py
from grequests import download
download("Chapter10","01.html")
download("Chapter10","02.html")
download("Chapter10","03.html")
download("Chapter10","04.html")
download("Chapter10","05.html")
download("Chapter10","06.html")
然后,我们定义一个可重用的 Python 函数,以便在笔记本中方便地加载和显示 HTML 界面:
Then, we define a reusable Python function to conveniently load and display the HTML interface within the notebook:
from IPython.display import HTML, display
def display_interface(filename):
with open(filename, "r", encoding="utf-8") as file:
html_content = file.read()
display(HTML(html_content))
在在本节中,我们将仔细分析代码01.html。目的是确保您清楚地理解界面的结构和逻辑。这种清晰的理解对于在演示过程中快速响应潜在客户的需求至关重要。
In this section, we’ll carefully walk through the code for 01.html. The goal is to ensure you clearly understand the interface’s structure and logic. This clarity is critical for responding swiftly to potential customer requests during the presentation.
代码以一个标准<head>部分开头,其中包含元数据(例如 `<head>`<meta charset="UTF-8">和 `<body> <title>`)。CSS(视觉样式)嵌入在<style>`<head>` 标签内:
The code begins with a standard <head> section, which includes both metadata (such as <meta charset="UTF-8"> and <title>). The CSS, visual styling, is embedded inside the <style> tag:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Generative AI Chat Interface</title>
<style>
body {
font-family: Arial, sans-serif;
margin: 20px;
background: #f4f4f4;
}
….
</style>
</head>
这两个<head>部分<body>都可以根据项目的具体品牌和设计指南进行完全自定义。在这种情况下,<body>容器将包裹所有内容并应用我们刚刚定义的 CSS 布局样式:
Both the <head> and <body> sections are fully customizable based on the project’s specific branding and design guidelines. In this case, the <body> container will wrap all the content and apply the CSS layout styling we just defined:
<body>
<div class="container">
标题中没有包含“生成式人工智能系统”一词。在这种情况下,选择使用更易于理解的术语,以便更广泛的受众都能理解:
The header doesn’t contain the term “Generative AI System.” The choice, in this case, is to provide an accessible term that a broader audience can relate to:
<h1>Generative AI Chat Interface</h1>
这用户选择模块包含本书中一直使用的三个通用用户名。您可以根据项目的具体要求,灵活地调整用户名以回答有关用户身份的任何问题。您也可以根据演示内容选择提供特定领域的用户名。此外,您还可以添加更多用户。目前,我们先保持简单:
The user selection block contains the three generic usernames we have been using throughout the book. It provides flexibility to adapt to any questions about who they are, depending on the project’s specifications. You can choose to provide domain-specific names depending on the context of your presentation. You could also add more users. For the moment, let’s keep it simple:
<div class="form-group">
<label for="userSelector">User:</label>
<select id="userSelector">
<option>User01</option>
<option>User02</option>
<option>User03</option>
</select>
</div>
用户输入信息区域包含标准信息。我们可以根据行业或任务对其进行修改。它甚至可以根据用户而变化,并具有领域特定性,例如“在此处输入客户评价”。在本例中,我们将显示我们用于构建 GenAISys 的消息:
The user input message area contains standard information. We can modify it to suit an industry or task. It could even vary depending on the user and be domain-specific, such as “enter the customer’s review here.” In this case, we will display the message we have been using to build our GenAISys:
<div class="form-group">
<label for="messageInput">Your Message:</label>
<textarea id="messageInput" placeholder="Type your message here or type 'exit' or 'quit' to end the conversation." rows="4"></textarea>
</div>
然后会实现一个标准的发送按钮,但请记住,在演示过程中,可以根据具体的品牌要求对其视觉效果进行自定义:
A standard send button is then implemented, but keep in mind that it can be visually customized according to specific branding requests during your presentation:
<div class="form-group">
<button id="sendButton">Send</button>
</div>
现在让我们为 AI 代理添加复选框选项/控件,以启用语音或激活文件管理功能:
Let’s now add the checkbox options/widgets for the AI agent to enable voice or activate file management:
<div class="checkbox-group">
<label>
<input type="checkbox" id="agentCheckbox" checked>
Agent
</label>
<label>
<input type="checkbox" id="voiceCheckbox">
Voice Output
</label>
<label>
<input type="checkbox" id="filesCheckbox">
Files
</label>
</div>
无论代码的设计初衷如何,都要仔细研读,以便在出现显示问题时能够实时修改。这将展现你的灵活性和应变能力。能够轻松适应客户需求。这可能有点棘手,但有时却是必要的。务必确保仅在客户坚持的情况下才进行修改,以免承担不必要的风险。
Study the code, regardless of how it was designed, to be prepared to modify it in real time if the situation becomes tense over how something is displayed. This will show that you are flexible and can easily adapt to customer needs. It can be tricky, but it might be necessary. Just make sure to modify something only if the customer insists so as to avoid taking unnecessary risks.
“推理”下拉菜单突出显示了 GenAISys 中强大的推理功能,这些功能已在前几章中介绍过。您可以轻松调整或添加推理选项,以满足您的项目需求:
The Reasoning dropdown highlights the powerful reasoning capabilities within our GenAISys, as implemented throughout the previous chapters. You can readily adjust or add reasoning options to match your project specifications:
<div class="form-group">
<label for="reasoningSelector">Reasoning:</label>
<select id="reasoningSelector">
<option value="None" selected>None</option>
<option value="Analysis">Analysis</option>
<option value="Generation">Generation</option>
<option value="Mobility">Mobility</option>
</select>
</div>
模型选择模块包含战略信息。客户会想知道该模型是否安全开源,或者其来源国家/地区。我们可以添加其他模型,或者解释说,我们可以利用下拉列表的灵活性添加更多模型:
The model selection block contains strategic information. The customer will want to know whether the model is secure and open source, or which country it originates from. We can add other models or explain that we can add more models with the flexibility of the drop-down list:
<div class="form-group">
<label for="modelSelector">Model:</label>
<select id="modelSelector">
<option value="OpenAI" selected>OpenAI</option>
<option value="DeepSeek">DeepSeek</option>
</select>
</div>
最后,输出区域是一个标准化的区域,用于清晰易读地显示对话回复:
Finally, the output area is a standardized section for displaying conversation responses clearly and legibly:
<div class="output-area">
<p><em>Conversation output will appear here...</em></p>
</div>
现在我们可以编写结束标签并打开文件了:
We are now ready to write the closing tags and open the file:
</div>
</body>
</html>
文件01.html该文件使用基本的文本编辑器创建,上传到我们的 GitHub 代码库,然后由我们的笔记本自动下载。您可以直接在浏览器或演示环境中打开该文件。这里,我们将在笔记本中打开它。
The 01.html file was created with a basic text editor, uploaded to our GitHub repository, and then downloaded automatically by our notebook. You can directly open the file in a browser or within your presentation environment. In this case, we will now open it in the notebook.
是否使用 HTML 进行演示由您决定。就此策略而言,PowerPoint 演示文稿可能过于静态,而直接运行 Python 代码则可能分散高管对核心概念的注意力。HTML 页面则能有效地平衡两者——既比静态幻灯片更具动态性,又比实时代码更清晰灵活。
The choice of whether to use HTML for the presentation is yours. For this particular strategy, a PowerPoint presentation may be too static, while directly running Python code risks distracting top executives from core concepts. An HTML page strikes an effective balance—more dynamic than static slides, yet clearer and more flexible than live code.
不过,我建议你把重点放在信息本身,而不是媒介上。要清晰阐述以人为本和自动化带来的益处,这些益处将帮助你的受众提升到一个新的层次。
I would advise, however, that you keep your focus on the message, not the medium. Articulate the human-centric and automated gains that will take your audience to the next level.
接下来,让我们概述一下我们将遵循的七步路线图,以便有效地展示我们的 GenAISys,然后再在接下来的章节中更深入地探讨每个组件:
Let’s now outline the seven-step roadmap we’ll follow to effectively showcase our GenAISys before exploring each component in greater depth in the upcoming sections:
有了这份路线图,让我们开始介绍核心的 GenAISys。
With that roadmap in place, let’s begin by presenting the core GenAISys.
导航在笔记本的“1. 生成式 AI 聊天界面”部分,运行以下单元格以显示页面01.html:
Navigate to the 1. The Generative AI Chat Interface section of the notebook, and run the following cell to display the 01.html page:
display_interface("/content/01.html")
此输出与我们在本书中精心开发的 IPython 接口完全一致。但是,通过将其显示为网页,我们强调了我们的 GenAISys 接口是平台无关的——它可以托管在任何地方,嵌入到演示文稿中,或通过 Web 浏览器访问,如图10.3所示:
This output mirrors exactly the IPython interface we have carefully developed throughout the book. However, by displaying it as a web page, we emphasize that our GenAISys interface is platform-independent—it can be hosted anywhere, embedded in presentations, or accessed via web browsers, as illustrated in Figure 10.3:
图 10.3:生成式 AI 聊天界面
Figure 10.3: Generative AI Chat Interface
这演示过程中提供的描述必须通俗易懂、直观易懂。您必须根据听众的水平调整技术术语的使用。首先使用清晰易懂的术语,然后根据需要逐步扩展,提供更专业或更简洁的解释。我们解释的灵活性必须与我们构建的 GenAISys 系统的灵活性相匹配。
The descriptions provided during your presentation need to be user-friendly and intuitive. You must adapt your level of technical terminology to the audience. Start with clear terms and then expand when necessary to provide more technical or even more straightforward explanations as needed. The flexibility of our explanations must match the flexibility of the GenAISys we’ve built.
以下是一些回答观众可能提出的各种问题以及我们必须解决的问题的方法。本书各处都已详细描述了技术细节。这里的目的是帮助你做好准备,回答一些棘手的问题。让我们逐一了解每个组成部分:
The following are some ways to address the many questions an audience can ask and what we must address. The technical details have been described throughout the book. The idea here is to prepare you to answer tricky questions. Let’s go through each component:
图 10.4:手动或自动用户选择
Figure 10.4: Manual or automatic user selection
图 10.5:选择推理函数
Figure 10.5: Selecting a reasoning function
图 10.6:选择生成式 AI 模型
Figure 10.6: Selecting a generative AI model
最后,我们可以解释一下,GenAISys 的界面仅仅是前端。GenAISys 的概念验证是一个复杂、可扩展且安全的架构。该界面提供了一个无缝、多用户、高性能的系统,支持来自世界各地的交互。此外,该概念验证可以无缝部署到 AWS、Microsoft Azure、IBM Cloud 或 Google Cloud 等云服务上。至此,我们可以深入探讨本书中开发并针对当前具体用例进行调整的功能细节。
We can conclude by explaining that the GenAISys interface is just the frontend. The GenAISys proof of concept is a sophisticated, scalable, and secure structure. The interface provides a seamless, multi-user, high-performance system that allows interactions from around the world. Also explain that this proof of concept can be deployed seamlessly onto cloud services such as AWS, Microsoft Azure, IBM Cloud, or Google Cloud. At this point, we can delve into the details of the functionality developed in this book and adapted to the specific use case at hand.
一个向量存储是您构建的 GenAISys 的重要组成部分。首先,说明 Pinecone 已实现,但如有需要,您可以实现另一个向量存储。然后,演示如何在前端界面上通过视觉元素告知用户 Pinecone 索引的状态。
A vector store is an important component of the GenAISys you built. First, explain that Pinecone was implemented, but that you can implement another vector store if needed. Then, demonstrate how visual elements can inform the user of the Pinecone index’s status on the frontend interface.
首先,将界面复制01.html到一个名为 `<filename>` 的新文件中02.html,以保持界面的初始呈现方式不变。为什么呢?一个内容完整的网页界面可能会让用户感到困惑,而循序渐进的方法可以确保不会让任何用户感到困惑。此外,客户可能不想看到 Pinecone 连接。或者客户可能不想继续深入,直接从 `<filename>` 开始项目01.html。面对所有这些情况,我们都必须保持灵活和适应性。
Start by duplicating 01.html to a new file named 02.html to keep the interface’s initial presentation as it was. Why? A fully populated web interface can confuse your audience, but a step-by-step approach assures you don’t lose anybody along the way. Also, maybe the customer might not want to see the Pinecone connection. Or the customer might not want to go further and start the project with 01.html. We must remain flexible and adaptable in case of all these scenarios.
首先02.html,添加一个清晰的松果连接指示器:
In 02.html, first add a clear Pinecone connection indicator:
<div class="container">
<h1>Generative AI Chat Interface</h1>
<!-- Pinecone Connection Status -->
<div id="pineconeStatus" style="text-align: center; margin-bottom: 10px;">
<span style="color: green;">●</span> Connected to Pinecone Index
</div>
<!-- Existing form elements -->
...
</div>
客户可能还希望在将数据添加到生成式人工智能模型之前,先查看松果检索结果。在这种情况下,我们可以添加一个新部分:
The customer may also want to review the Pinecone retrieval before augmenting the input to the generative AI model. We could add a new section, in this case:
<div class="output-area">
<p><em>Conversation output will appear here...</em></p>
</div>
<div id="pineconeResults" style="background: #fff; padding: 10px; border: 1px solid #ccc; margin-top: 10px;">
<h3>Context Retrieved from Pinecone</h3>
<p><em>No results yet.</em></p>
</div>
然后我们可以运行该单元格来显示界面:
We can then run the cell to display the interface:
display_interface("/content/02.html")
界面现在包含了我们可以应用的改进功能,如下所示:
The interface now contains the possible enhancements we could apply, as shown here:
图 10.7:利用 Pinecone 功能增强界面
Figure 10.7: Enhancing the interface with Pinecone features
呈现Pinecone,我们可以进一步深入了解并展示矢量商店的结构,请访问我们的 Pinecone 控制台:https://www.pinecone.io/。可以总结第三章的内容, 同时展示Chapter03/Pinecone_RAG.ipynb带有清单的笔记本版本,例如以下清单:
To present Pinecone, we could drill down further and show the structure of a vector store by going to our Pinecone console at https://www.pinecone.io/. You can summarize what was covered in Chapter 3 while showing the Chapter03/Pinecone_RAG.ipynb version of the notebook with a checklist such as the following one:
不过,在深入讲解笔记本内容之前,请确保你的听众理解向量和向量存储的含义。否则,只需简单解释一下:向量存储提供了一种高效的方式来检索数据,并增强生成式人工智能的输入,从而获得更好的结果。
Before drilling down into a notebook, however, make sure your audience understands the meaning of vectors and vector stores. Otherwise, simply explain that the vector store provides a highly efficient way of retrieving data and augmenting generative AI inputs to obtain better results.
你也可以运行任何其他查询 Pinecone 索引的笔记本,或者您自己创建的笔记本。请注意,本书中的笔记本仅供教学用途,因此您可能需要在此基础上创建自己的笔记本用于演示。现在,让我们在以人为本的环境中引入 KPI。
You can also run any other notebook that queries the Pinecone index or one you built. Note that the notebooks in this book are educational, so you might want to build on them to create your own notebooks for a presentation. Let’s now introduce KPIs in a human-centric environment.
一个以人为本的GenAISys实施仍然是部署人工智能的最佳方式。这种以人为本的方法乍一看似乎有悖常理。为什么不直接替换员工呢?为什么不裁掉所有可以替代的人呢?为什么不通过裁员来获得大量资金呢?毕竟,既然GenAISys可以节省如此多的时间和金钱,为什么不大幅缩减员工规模呢?为什么不直接投资并获得快速的投资回报呢?一个简单的、看似违反直觉的投资回报方案1(通过裁员或“替代”)如下所示:
A human-centric implementation of a GenAISys remains the best way to deploy AI. A human-centered approach seems counterintuitive at first. Why not just replace employees? Why not lay off all the people who can be replaced? Why not get a lot of cash out of firing employees? After all, why not drastically reduce the workforce since a GenAISys can save so much time and money? Why not simply invest the money and earn a rapid ROI? A simple, counter-intuitive ROI Scenario 1 through layoffs or “replacement” would look as follows:
的确,一些公司为了高度依赖人工智能,进行了大规模裁员。它们或许看起来还算成功。然而,在效仿之前,请考虑我的个人经验和观点,然后再决定您希望采取的行动。接下来,让我们一起探讨一下我对通过增长实现投资回报率的理解。
Indeed, certain corporations have pursued extensive layoffs, choosing to rely heavily on AI. They might even appear successful. However, before following that path, consider my personal experience and perspective, then choose how you wish to proceed. Let’s go through my vision of ROI through growth.
图 10.8说明我基于过去几十年真实人工智能项目的经验,提出了我对人工智能实施和政策的愿景。它通过GenAISys神经科学营销提升销售额,缩短产品上市时间(包括CoT),并通过GenAISys促进销售,从而实现投资回报率。
Figure 10.8 illustrates my vision of AI implementations and policy based on real-life AI projects over the past decades. It represents ROI by increasing sales through GenAISys neuroscientific marketing, decreasing time to market production-to-delivery with generative AI, including CoT, and boosting sales with a GenAISys.
图 10.8:通过生长而演变
Figure 10.8: Evolution through growth
这怎么可能?我是个理想主义者吗?让我用我实际实施的一个人工智能项目来解释一下。该项目是为一个占地超过10万平方米(约合107.6万平方英尺)的巨型仓库内的配送中心而开发的。该项目是一个从零开始构建的混合人工智能和高级数学项目:
How is that possible? Am I an idealist? Let me break this down with a real-life AI project I implemented for a distribution center in a mega-warehouse of 100,000+ square meters, which is roughly 1.076+ million square feet. The project was a ground-up hybrid AI and advanced mathematics project:
图 10.9:大型仓库 AI 驱动的优化
Figure 10.9: Mega-warehouse AI-driven optimization
那么,为什么不取代那些日夜不停地工作、试图自动解决这些问题的众多规划人员呢?起初,大家都认为这是可行的。我发现,在某种程度上,这的确可行。然而,我很快发现,规划人员考虑的许多参数都是无法预测的。生成式人工智能,甚至是基于规则的系统,虽然擅长识别模式,但在处理其他情况时却束手无策。没有!他们在遇到前所未有的、非重复出现的情况时会遇到很大的困难。
So why not replace the many planners who work 24/7 to solve these issues automatically? Everybody first thought this was possible. I found that this could be done to a certain extent. However, I discovered quickly that many parameters the planners considered were not predictable. Generative AI, or even rule-based systems, excel at detecting patterns but are at a loss when there are none! They struggle significantly when encountering unprecedented, non-repeating scenarios.
设想一下这个真实的仓库场景:100234号货箱缺少标签,而另一个货箱(343445号)中装有有缺陷的产品。与此同时,一辆自动导引车(AGV)莫名其妙地……车辆行驶速度比平时慢。94号码头的货车必须立即出发,因为现在是周六晚上11点左右,而周日货车不能通行高速公路。任何延误都可能导致严重的罚款和客户不满。
Consider this actual warehouse scenario: stickers are missing from crate #100234, while another crate contains a faulty product (#343445). Simultaneously, an automated guided vehicle (AGV) is inexplicably operating slower than usual. The truck at Pier 94 must depart immediately, as it’s nearly 11:00 p.m. on a Saturday, and trucks cannot use highways on Sundays. Any delay now risks severe penalties and customer dissatisfaction.
我发现类似的问题每天都在发生。所以,没错,拥有强大数学功能的混合型GenAISys系统确实可以自动化许多决策流程。而且,这确实能为大型仓库带来可观的年度毛利润。但是,人工智能无法解决的那些未解之谜又该如何解决呢?要知道,在分销供应链中,延迟交货会招致巨额罚款,甚至可能导致客户不信任,最终造成大量合同的流失。
I found that many similar problems occurred daily. So, yes, a hybrid GenAISys with powerful mathematical functions could automate many decision-making processes. And yes, this represented a lot of gross margins over a year in large warehouses. But what about all the unsolved problems that an AI cannot solve? Remember, in distribution supply chains, late deliveries incur stiff penalties and possibly customer distrust, leading to the loss of many contracts.
那时,我夜以继日地研究另一种方法。即使提高生产效率需要减少仓库的整体人员数量,我们也应该保留现有人员,尤其是计划人员。为什么?如何实现?我设计了一个界面,其中包含一个关键绩效指标(KPI),类似于我们将在本节中添加的指标,并结合人工智能代理和人类专家。图 10.10展示了专家如何实时查看 KPI。每次专家运行人工智能系统并做出决策时,系统都会显示节省了多少成本,例如无需使用第二辆卡车、调整订单以优化人员配置,以及有更多时间来纠正错误。
At that point, I worked 24/7 on another approach. Let’s keep the personnel, especially the planners, even if increased productivity requires fewer personnel overall in the warehouse. Why and how? I designed an interface that contained a KPI like the one we will add in this section with an AI agent and a human expert. Figure 10.10 illustrates how an expert can visualize KPIs in real time. Every time the expert ran the AI system and completed the decision, taking complex unplanned events into account, the system displayed how much money was saved, such as not having to use a second truck, switching orders to optimize personnel, and having the extra time to fix errors.
图 10.10:以人为中心的 GenAISys 协作
Figure 10.10: Human-centric GenAISys collaboration
因此,专业的规划人员提高了入库和出库的存储速度。仓库无需建造额外的码头、购买新的自动导引车或雇用更多员工,即可处理更多货物。生产效率的提升使这家大型仓库的毛利率提高了3%。仓库管理毛利率的如此显著增长带来了可观的利润。
The expert planner thus increased the speed of the incoming and outgoing storage events. The warehouse could process more units without having to build additional piers, purchase new AGVs, or hire more hands. The productivity gains represented a 3% gross margin increase for the mega-warehouse. Such a significant surge in gross margin in warehouse management generated substantial profit.
你可以根据自己的需要设定投资回报率的参数,但核心理念是人机协作增长能够带来极高的效率。与裁员相比,以人为中心的系统通过增长实现的投资回报率要高得多。
You can design the ROI ratio with the parameters you wish, but the core concept is that collaborative human-AI growth is highly productive. The ROI through growth produced far more margin with a human-centric system than laying off personnel:
这个成功的项目对我的声誉和销售额产生了巨大的影响:
This successful project had a huge impact on my reputation and sales:
最终,在激进裁员和战略增长之间做出选择,仍然取决于您。这个例子仅仅反映了我个人的方法和经验。无论如何,如果客户需要实时KPI,我们可以将其添加到界面中。
Ultimately, the choice between aggressive layoffs and strategic growth remains yours. This example simply reflects my personal approach and experiences. In any case, if the customer wants a real-time KPI, let’s add it to the interface.
如果您的客户是如果您感兴趣,以下是如何将实时KPI集成到现有Web界面中的方法,重点在于以人为本的方法:
If your customer is interested, here’s how we can implement a real-time KPI into the existing web interface, emphasizing the human-centric approach:
02.html)复制到新版本(03.html)以保留之前的界面状态。然后,添加KPI面板:
<!-- Gross Margin KPI Panel -->
<div class="kpi-panel" id="grossMarginKPI">
<h2>Gross Margin</h2>
<div class="kpi-value" id="grossMarginValue">Loading...</div>
</div>
02.html) to a new version (03.html) to preserve previous interface states. Then, add the KPI panel:
<!-- Gross Margin KPI Panel -->
<div class="kpi-panel" id="grossMarginKPI">
<h2>Gross Margin</h2>
<div class="kpi-value" id="grossMarginValue">Loading...</div>
</div>
.kpi-panel {
background: #fff;
padding: 15px;
margin: 20px auto;
border: 1px solid #ccc;
border-radius: 5px;
text-align: center;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
max-width: 300px;
}
.kpi-panel h2 {
margin-bottom: 10px;
font-size: 20px;
}
.kpi-value {
font-size: 36px;
font-weight: bold;
color: #007bff;
}
.kpi-panel {
background: #fff;
padding: 15px;
margin: 20px auto;
border: 1px solid #ccc;
border-radius: 5px;
text-align: center;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
max-width: 300px;
}
.kpi-panel h2 {
margin-bottom: 10px;
font-size: 20px;
}
.kpi-value {
font-size: 36px;
font-weight: bold;
color: #007bff;
}
// Example function to update the KPI
function updateGrossMargin() {
// Replace this with your real-time data-fetching logic
const grossMargin = fetchGrossMarginFromBackend(); // your API call here
document.getElementById('grossMarginValue').textContent = `$${grossMargin.toFixed(2)}`;
}
// Simulate real-time update every 5 seconds
setInterval(updateGrossMargin, 5000);
// Example function to update the KPI
function updateGrossMargin() {
// Replace this with your real-time data-fetching logic
const grossMargin = fetchGrossMarginFromBackend(); // your API call here
document.getElementById('grossMarginValue').textContent = `$${grossMargin.toFixed(2)}`;
}
// Simulate real-time update every 5 seconds
setInterval(updateGrossMargin, 5000);
现在运行笔记本第303.html部分 KPI 中显示的单元格:
Now run the cell that displays 03.html in section 3. KPI of the notebook:
display_interface("/content/03.html")
该模拟实时显示了毛利率 KPI 的更新情况,直观地强化了人工规划人员为 GenAISys 实施带来的价值:
This simulation displays real-time updates of the gross margin KPI, visually reinforcing the value human planners bring to GenAISys implementations:
图 10.11:人与 KPI 的关系
Figure 10.11: A human-KPI relationship
客户可能希望删除此功能或对其进行增强以适应其项目需求。我们的职责是根据项目需求进行调整,无论所需的自动化程度如何。接下来,我们将介绍如何将我们的 GenAISys 集成到客户环境中。
The customer may wish to delete this function or enhance it to adapt it to their project. Our role is to adapt to the project’s needs, regardless of the level of automation required. We will now proceed to present the integration of our GenAISys in the customer’s environment.
整合是在部署 GenAISys 的过程中,平台、操作系统和框架的选择往往是最具挑战性的阶段之一。虽然初创公司或内部项目最初可能拥有自由选择平台的优势,但实际应用中常常会受到来自投资者、客户或内部政策的严格限制。以下示例均来自真实案例,旨在说明您可能面临的挑战:
Integration is often among the most challenging stages of deploying a GenAISys, particularly when it comes to selecting platforms, operating systems, and frameworks. While start-ups or internal projects might initially have the luxury of freely selecting platforms, real-world scenarios often involve stringent constraints from investors, customers, or internal policies. The following examples are drawn from real-world situations that illustrate the potential challenges you may face:
可用的托管平台、安全和隐私框架种类繁多,不胜枚举!您的 GenAISys 概念验证必须保持灵活性。正因如此,我们构建的前端 HTML 页面旨在快速适应特定客户需求,而无需对后端进行深度修改。之后,您可以根据客户的要求,利用本书内置的教育组件以及您为准备专业演示文稿而添加的组件,深入讲解相关内容。
The list of possible hosting platforms, security, and privacy frameworks is limitless! Your GenAISys proof of concept must remain flexible. This flexibility is why the frontend HTML page we’ve built is designed for quick adaptation to specific customer requirements—without necessitating deep modifications at the backend. Then, you can take the customer as deep as requested with the educational components built in this book and the ones you have added to prepare a professional-specific presentation.
但一体化但这还不是全部!客户可能会提出一些出乎意料的创意需求。例如,潜在客户可能会问你以下两个问题:
But integration doesn’t stop here! Creative and unexpected requests can arise. A potential customer might ask you the following two questions:
一旦您解决了这些挑战,可能会出现更多集成需求,例如企业级ERP系统(如SAP或Oracle)、特定数据库解决方案,甚至需要认证来验证您的GenAISys系统的合规性和安全性。对此,我想说,欢迎来到现实世界!您必须牢记一个基本原则:无论您的GenAISys系统开发到何种程度,潜在客户通常都会要求更多功能或不同的系统实施方案。您的GenAISys系统本身就具有合法性。除此之外,还要保持灵活,成为这方面的专家!
Once you address these challenges, further integration demands may emerge around enterprise-level ERPs (such as SAP or Oracle), specific database solutions, or even certifications to authenticate your GenAISys’s compliance and security measures. For that I say, welcome to the real world! You must hold on to one basic principle: No matter how far you develop your GenAISys, potential customers will often ask for more or a different way to implement your system. Your GenAISys provides legitimacy. Beyond that, remain a flexible expert!
现在让我们来展示一下我们适应任何情况的能力。复制03.html并命名此代码04.html,以保持界面上一步不变。将此部分添加到 HTML 前端页面:
Let’s now demonstrate our willingness to adapt to any situation. Copy 03.html and name it 04.html to keep the interface’s previous step as is. Add this section to the HTML frontend page:
<!-- ERP Integration Dropdown -->
<div class="form-group">
<label for="erpIntegrationSelector">ERP, database, platform and meeting Integration:</label>
<select id="erpIntegrationSelector">
<option value="none" selected>Select ERP or Meeting API</option>
<option value="SAP">SAP</option>
<option value="Oracle">Oracle</option>
<option value="Microsoft Dynamics">Microsoft Dynamics</option>
<option value="NetSuite">NetSuite</option>
<option value="AWS">AWS</option>
<option value="Google Cloud">Google Cloud</option>
<option value="Azure">Azure</option>
<option value="Zoom API">Zoom</option>
<option value="Teams API">Teams</option>
<option value="Other">Other</option>
</select>
</div>
现在,运行页面以显示更新后的界面:
Now, run the page to display the updated interface:
display_interface("/content/04.html")
这新增的集成选择功能显示在“原因”下拉菜单的正下方(如图 10.12所示)。它为讨论多模态 CoT、ERP 集成或特定平台兼容性提供了极佳的视觉辅助工具。
This newly added integration selection appears just below the Reasoning dropdown (as shown in Figure 10.12). It’s an excellent visual aid to discuss multimodal CoT, ERP integrations, or specific platform compatibilities.
图 10.12:ERP、数据库、平台和会议集成列表
Figure 10.12: ERP, database, platform, and meeting integration list
添加此下拉菜单之前,请确保您已做好充分准备。请熟悉列出的每个集成(如图 10.13所示),因为您的受众可能会针对每个项目提出具体、详细的问题:
Before adding this dropdown, ensure you’re thoroughly prepared. Familiarize yourself with each listed integration (illustrated in Figure 10.13), as your audience may ask specific, detailed questions on each item:
图 10.13:一些可能的集成方案列表
Figure 10.13: The list of some of the possible integrations
记住,GenAISys提供的集成灵活性远不止这些示例。它甚至可能包括MAS实现——这是我们接下来要探讨的另一种复杂方法。
Remember, the integration flexibility your GenAISys offers extends beyond these examples. It might even include MAS implementations—another sophisticated approach we will now explore.
此时此刻演示文稿,你可以从展示 GenAISys 目前的功能转向重点介绍它所带来的未来发展。隆重推出MAS 不仅仅是一项新功能,它更展现了生成式人工智能的变革潜力,揭示了系统核心设计如何应对复杂任务。它完美诠释了本书的核心理念:以人为本、可扩展、协作式人工智能。
At this point in the presentation, you can pivot from showing what your GenAISys currently does to highlighting the future it enables. Introducing a MAS is not just a new feature—it’s a glimpse into the transformative potential of generative AI, demonstrating how your system’s core design can handle complex tasks. It captures the very philosophy of this book: human-centric, scalable, and collaborative AI.
决定是否展示该内容时,请考虑以下因素:
When deciding whether or not to present it, consider the following parameters:
如果您选择展示MAS,应明确地将其定位为一项具有创新意义的变革性技术。它可以自主运行或部分自主运行——既可以独立运行,也可以通过您现有 GenAISys 系统中的简化界面运行。
If you choose to present the MAS, frame it clearly as an innovative game-changer. It can operate autonomously or partially autonomously—either standalone or through a simplified interface within your existing GenAISys.
要查看完整内容要了解此多智能体系统的实现,请打开GenAISYS_&_MAS.ipynb笔记本。程序的流程和代码设计得非常清晰易懂,每个部分都对其用途进行了明确的解释。我们(指作者本人以及我的项目合作者 Gemini 2.5 Pro/ChatGPT)在代码中直接添加了详细的注释,不仅解释了函数的功能,还解释了其背后的设计决策——从引导智能体的提示机制到使群体高效运行的异步模式。
To see the full implementation of this MAS, open the GenAISYS_&_MAS.ipynb notebook. The program’s flow and code are designed to be highly self-explanatory, with each section having a clear explanation of its purpose. We, meaning the human author and Gemini 2.5 Pro/ChatGPT, my co-workers for this project, have taken care to add detailed comments directly in the code to explain not just what the functions do but also the design decisions behind them—from the prompt engineering that guides the agents to the asynchronous patterns that enable the swarm to run efficiently.
The notebook has three key features:
asyncio而aiohttp不是使用现有的智能体框架。这种从零开始的方法有两个目的。首先,它揭示了多智能体编排的核心机制,这也是本项目的一个教学目标。其次,它确保最终的框架尽可能灵活和模块化,不受任何单一平台的限制,并能够适应任何实际业务需求。asyncio and aiohttp rather than using a pre-existing agentic framework. This from-scratch approach serves a dual purpose. First, it demystifies the core mechanics of how multi-agent orchestration works, an educational goal of this project. Second, it ensures the resulting framework is as flexible and modular as possible, free from the constraints of any single platform, and ready to be adapted to any real-world business requirements.图 10.14:MAS 工作流程概述
Figure 10.14: MAS workflow overview
The preceding figure shows the main components of the MAS:
现在我们来探讨一下战略整合方案。
Let’s now look into the strategic integration options.
现在那我们已经确定了MAS 的内容和原因,关键的业务问题是它如何与我们提出的 GenAISys 相契合。
Now that we’ve established the what and the why of this MAS, the crucial business question is how it fits with the GenAISys we’ve presented.
图 10.15:MAS 的战略整合模型
Figure 10.15: The strategic integration models for the MAS
如图 10.15所示,我们可以提供部署该技术的三大战略愿景:
As shown in Figure 10.15, we can offer three strategic visions for its deployment:
这种一体化的愿景将您的 GenAISys 提升到超越单纯功能,成为一个完全实现的、企业级的 AI 生态系统。
This integrated vision elevates your GenAISys beyond mere functionality into a fully realized, enterprise-ready AI ecosystem.
从零开始构建 GenAISys 后,您现在拥有了一个功能强大、灵活的 AI 控制器,能够进行动态编排和高级推理。这一坚实的基础使您能够更好地拥抱新兴标准。例如模型上下文协议(MCP)和代理通信协议(ACP):
Having built your GenAISys from the ground up, you now possess a powerful, flexible AI controller capable of dynamic orchestration and advanced reasoning. This solid foundation uniquely positions you to embrace emerging standards such as the Model Context Protocol (MCP) and the Agent Communication Protocol (ACP):
您的专业知识在设计智能AI管道和代理工作流程时,意味着您已做好充分准备来利用甚至助力构建这些强大的互操作性框架,将您的企业级人工智能解决方案提升到新的高度。现在,一个更具挑战性的方面出现了:安全限制和隐私法规。
Your expertise in designing intelligent AI pipelines and agentic workflows means you are well equipped to leverage and even contribute to these powerful interoperability frameworks, taking your business-ready AI solutions to the next level. Now comes a more difficult aspect: security constraints and privacy regulations.
安全和隐私通常会继承初始集成阶段选定的托管平台或框架的限制。然而,即使您的系统符合既定标准,您也必须清晰地展示 GenAISys 中的审核和数据安全组件。务必做好准备,应对有关这些主题的严格提问。强烈建议您在演示期间有认证安全专家在场。如果无法做到这一点,请确保您能够证明您愿意根据客户既定的协议和框架调整您的 GenAISys。
Security and privacy often inherit constraints from the selected hosting platforms or frameworks decided in the initial integration phase. However, even if your system aligns with established standards, you must clearly demonstrate your moderation and data security components within the GenAISys. Always be ready to face rigorous questions on these topics. It’s strongly recommended to have a certified security expert alongside you during the presentation. If that’s not possible, ensure you demonstrate your willingness to adapt your GenAISys according to the customer’s established protocols and frameworks.
以下是您的 GenAISys 可能需要集成或与之保持一致的基本安全层——即使这些安全层由其他团队或外部服务管理,也请仔细审查它们,以便应对具有挑战性的问题:
Here are essential security layers that your GenAISys might need to integrate or align with—review them carefully to be ready for challenging questions, even if another team or external service manages these security layers:
你没有您无需成为专家,但至少应该能够使用准确的术语自如地讨论这些领域。请仔细阅读提供的链接,确保您了解最新的安全实践。
You don’t have to be an expert, but at minimum, you should comfortably discuss these areas using accurate terminology. Review the provided links carefully to ensure you are up to date with contemporary security practices.
让我们来展示一下您的系统如何有效地处理实时安全警报。和往常一样,为了避免信息过载,请复制04.html并创建演示文稿05.html,确保如果您决定不再继续演示,之前的演示状态保持不变。
Let’s showcase how your system might handle real-time security alerts effectively. As usual, to avoid overwhelming your audience, copy 04.html to create 05.html, ensuring the previous presentation state remains untouched if you decide not to proceed further in the demonstration.
我们将在灵活的HTML界面中添加一个简单、直观的红色警报横幅:
We’ll add a simple, visually intuitive red alert banner to our flexible HTML interface:
<!-- Place this alert banner near the top of your container -->
<div id="securityAlert" style="display:none; background-color: #ffdddd; color: #a94442; padding: 15px; border: 1px solid #ebccd1; border-radius: 4px; margin-bottom: 20px;">
<strong>Security Alert:</strong> Suspicious activity detected.
</div>
然后,在你的 JavaScript 代码中,你可以根据特定条件轻松切换其可见性:
Then, in your JavaScript, you can easily toggle its visibility based on certain conditions:
// Example function to display the alert
function showSecurityAlert() {
document.getElementById('securityAlert').style.display = 'block';
}
// Example function to hide the alert
function hideSecurityAlert() {
document.getElementById('securityAlert').style.display = 'none';
}
// Simulate a security event after 5 seconds (for demonstration purposes)
setTimeout(showSecurityAlert, 5000);
red alert banner / 5 seconds
在这个示例中,警报横幅会在五秒后自动出现,直观地展示了实时安全通知的显示方式。您可以阅读代码,以便在会议期间根据需要实时或至少快速地进行修改。部分客户可能不喜欢它的显示方式或颜色。
In this example, the alert banner appears automatically after five seconds, visually illustrating how a real-time security notification might look. Read the code to be able to modify it in real time or at least rapidly during a meeting if requested. Some customers might dislike the way it’s displayed or its color.
有时,修改正如我们多次强调的,前端页面展现了您快速响应客户需求的意愿。另一方面,谨慎周全的修改则体现了您的谨慎和可靠。根据您对受众需求和期望的理解,做出战略性选择。
Sometimes, modifying the frontend page shows your willingness to adapt to the customer’s needs quickly, as we’ve reiterated several times now. On the other hand, careful and considered modifications signal caution and reliability. Make a strategic choice depending on your understanding of your audience’s needs and expectations.
警报可能来自您的系统或托管环境。现在,让我们运行代码以显示更新后的界面:
The alert can come from your system or the hosting environment. Now, let’s run the code to display the updated interface:
display_interface("/content/05.html")
五秒后将显示安全警报,如下图所示:
A security alert will be displayed after five seconds, as illustrated here:
图 10.16:实时安全警报横幅
Figure 10.16: A security alert banner in real time
在项目实施阶段,必须精确定义安全警报条件,以确保与客户的具体安全策略保持一致。在解决了安全和隐私问题之后,我们现在可以进入定制化这个充满挑战且往往细致入微的领域。
The security alert conditions must be precisely defined during the project’s implementation stage to ensure alignment with the customer’s exact security policy. With security and privacy considerations addressed, let’s now move on to the nuanced and often challenging area of customization.
和大多数情况一样软件方面,GenAISys 的成功取决于我们定制应用程序的能力。无论我们如何努力,最终用户总会提出界面和流程方面的改进要求。我们可以忽略这些要求,也可以接受它们。折衷方案是:接受那些可行且在项目范围内的要求,并尽可能为那些需要根本性修改的改进找到切实可行的替代方案。
As with most software, the GenAISys’s path to success relies on our ability to customize an application. No matter how hard we try, the end users will request interface and process evolutions. We can ignore them or accept them. The middle ground is to accept the requests that are feasible and within the scope of the project, and find as many realistic workarounds as possible for evolutions that would require fundamental modifications.
定制化方法有很多种;在这里,我们重点介绍一种以人为本的实用三阶段定制化方法:
There are many possible customization approaches; here, we focus on a practical three-phase, human-centric customization method:
图 10.17:以人为中心的三阶段定制流程
Figure 10.17: A three-phase human-centric customization process
所有三个阶段将在工作坊周期中持续推进。强化参与者作为积极的共同创造者的角色,而不是仅仅表达意愿的被动技术消费者。
All three phases will progress continuously in workshop cycles. Reinforce your participants’ roles as active co-creators, rather than passive consumers of technology who only express their wishes.
再次复制05.html并命名06.html。添加一些图标来激发观众的反应,让他们能够表达自己的创造力、想法和需求。如果你成功地调动了观众的积极性,你就已经走在成功的道路上了。
Again, copy 05.html and name it 06.html. Add some icons to trigger reactions from the audience so that they can express their creativity, ideas, and needs. If you succeed in involving your audience, you’re already on your way to success.
Font Awesome此代码中的图标来自 Font Awesome。您可以通过在 HTML 中添加链接来引入 CSS 库,从而使用这些图标<head>。然后,使用带有相应 Font Awesome 类名的标签将图标插入到 HTML 中<i>。以下是一些关键代码片段,您可以在研讨会期间实时或近实时地进行自定义:
The icons in this code come from Font Awesome. They are made available by including the Font Awesome CSS library via a link in <head>. Then, the icons are inserted throughout the HTML using <i> tags with the appropriate Font Awesome classes. Here are the key snippets that you can customize in real time or near-real time during your workshops:
Font Awesome库:
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css">
Font Awesome library:
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css">
<h2><i class="fas fa-chart-line"></i> Gross Margin</h2>
<label for="userSelector"><i class="fas fa-user"></i> User:</label>
<label for="messageInput"><i class="fas fa-comment"></i> Your Message:</label>
<button id="sendButton" title="Send your message"><i class="fas fa-paper-plane"></i> Send</button>
<label for="reasoningSelector"><i class="fas fa-brain"></i> Reasoning:</label>
<label for="modelSelector"><i class="fas fa-robot"></i> Model:</label>
<h2><i class="fas fa-chart-line"></i> Gross Margin</h2>
<label for="userSelector"><i class="fas fa-user"></i> User:</label>
<label for="messageInput"><i class="fas fa-comment"></i> Your Message:</label>
<button id="sendButton" title="Send your message"><i class="fas fa-paper-plane"></i> Send</button>
<label for="reasoningSelector"><i class="fas fa-brain"></i> Reasoning:</label>
<label for="modelSelector"><i class="fas fa-robot"></i> Model:</label>
每个代码片段都利用了 Font Awesome 直观的图标库(例如fas fa-chart-line、fas fa-user等),确保每个组件的功能都能清晰地以视觉方式传达。同样,您也需要做好准备,在研讨会期间实时快速地修改这些图标,以展示您根据用户反馈灵活调整的能力。
Each snippet leverages Font Awesome’s intuitive icon library (such as fas fa-chart-line, fas fa-user, etc.), ensuring clear visual communication of each component’s purpose. Here, too, you’ll have to be prepared to swiftly modify these icons in real time during workshops to demonstrate your agility in adapting to user feedback.
现在,运行网页以查看您带有丰富图标的界面:
Now, run the web page to visualize your icon-rich interface:
display_interface("/content/06.html")
页面现在包含图标,如下面的 HTML 页面摘录所示:
The page now contains icons, as illustrated in the following excerpt of the HTML page:
图 10.18:使用图标自定义网页
Figure 10.18: Customizing the web page with icons
不要服用别轻视这个话题!最终用户只能看到前端网页及其输出内容(文本、图像和弹出窗口)。务必花足够的时间倾听客户(你的同事或外部客户)的意见。最终用户在这个领域与你合作得越多,它就越能成为他们自己的GenAISys。
Do not take this topic lightly! An end user only sees the frontend web page and its outputs (text, images, and popups). Spend all the necessary time listening to the customer (your colleagues or external customers). The more the end users work with you in this area, the more it becomes their GenAISys.
难点在于如何安排第二阶段(GenAISys)开发和第三阶段(GenAISys与网页集成)的进度。您可以并行实施这三个阶段,从而加快项目进度,如图10.19所示。
The tricky part is fitting in phase 2 (GenAISys) development and phase 3 (GenAISys and web page integration). You can fast-track the project by implementing all three phases in parallel, as illustrated in Figure 10.19.
图 10.19:团队协作,并行开展所有三个阶段的工作,可现场或远程进行。
Figure 10.19: Working in teams on all three phases in parallel on-site or remotely
选择取决于项目和您的愿景。您的部署方案将为快速推进、内容丰富、优化上市时间的项目创造持续的动力。然而,每个项目按其自身速度推进,最佳速度是用户选择的速度,让他们有时间适应您先进的 GenAISys。
The choice depends on the project and your vision. Your deployment approach will create the continual momentum required for a fast-tracked, eventful, optimized time-to-market project. However, each project advances at its own speed, and the best pace is the one the users choose, giving them time to adapt to your cutting-edge GenAISys.
在某个阶段,你不可避免地需要解释项目所需的资源及其相关成本。所以,我们接下来就来讨论这个问题。
At some point, you will inevitably need to explain the resources required for the project and its associated costs. So, let’s move on and discuss that.
每一个一个成功的软件项目会明确规定所需的人力和机器资源。机器资源通常与之前集成讨论中确定的平台紧密相关(参见本章“集成:平台和框架”部分)。然而,人力资源需要更仔细的考虑和更详细的描述,例如使用结构化的方法,如第 1 章中概述的RACI矩阵。回想一下,RACI矩阵将项目职责分为四类:
Every successful software project clearly specifies the human and machine resources required. Machine resources usually align closely with the platforms identified during the earlier integration discussion (refer to the Integration: Platforms and frameworks section in this chapter). Human resources, however, require more careful consideration and explicit detailing using a structured approach such as the RACI matrix outlined in Chapter 1. Recall that a RACI matrix classifies project responsibilities into four categories:
笔记本的第 7 节(RACI)提供了一个专门为本书中开发的 GenAISys 设计的 RACI 矩阵的实用示例。图 10.20显示了该 RACI 构思的摘录,为构建您项目特定的 RACI 提供了一个具体的起点:
Section 7 (RACI) of the notebook provides a practical example of a RACI matrix specifically designed for the GenAISys we’ve developed in this book. Figure 10.20 shows an excerpt from this RACI ideation, providing a concrete starting point to build your project-specific RACI:
图 10.20:本书中构建的 GenAISys 的 RACI 概念图节选
Figure 10.20: Excerpt of the RACI ideation for the GenAISys built in this book
尽管在演示过程中,您需要解释GenAISys 实施构思的 RACI 矩阵清晰地概述了一种协调的、跨职能的方法来部署先进的 AI 系统:
While presenting, you will need to explain that the RACI Matrix for GenAISys Implementation ideation clearly outlines a coordinated, cross-functional approach to deploying an advanced AI system:
使用以 RACI 矩阵为参考,您可以直接估算和协商(公司内部、与投资者或客户协商)所需成本。明确的角色划分有助于您量化人力工时、有效分配预算并识别需要专业知识的领域。这使您能够透明地估算项目成本、协商内部或外部资源分配,并将项目分解为可管理的里程碑或交付成果。
Using the RACI matrix as your reference point will take you directly to the costs you must estimate and negotiate (in your company, for investors, or with a customer). Clearly defined roles will help you put a number to the human hours, allocate budgets effectively, and identify areas requiring specialized expertise. This allows you to transparently estimate project costs, negotiate resource allocation internally or externally, and segment your project into manageable milestones or deliverables.
至此,您已做好使用 GenAISys 开展业务的准备。市场任您驰骋!
And with that, you are now business-ready for a GenAISys. The market is yours to conquer!
在本章中,我们设计了一个战略框架,旨在激烈的市场竞争中展示已具备商业应用能力的 GenAISys 系统。我们发现,仅靠技术创新不足以赢得客户或资金。除了技术概念验证之外,还需要引人入胜的商业叙事来展现其实际价值。一种快速有效的方法是,首先简要介绍团队,迅速建立我们的信誉和专业实力。然后,将演示重点转移到我们构建的 GenAISys 系统的清晰阐述上,并以实时关键绩效指标 (KPI) 为支撑。
In this chapter, we designed a strategic framework for showcasing a business-ready GenAISys amid intense market competition. We saw that technological innovation alone is not enough to secure customers or funding. A compelling business narrative must accompany the technical proof of concept to demonstrate real-world value. One fast-track approach is to begin with a concise introduction of the team and quickly establish our credibility and expertise. Then, pivot the presentation to a clear explanation of the GenAISys we have built, underpinned by real-time KPIs.
我们设计了一个灵活的前端网页来展示系统的功能。其理念是打造一个可适应的HTML页面,能够根据实际情况实时或快速地进行修改。该前端网页将我们构建的IPython接口转换为易于访问且可定制的HTML显示界面。实时数据更新和实时模拟进一步强化了系统的实际应用效果。我们精心设计了一个叙述方式,强调GenAISys的适应性和能够随着客户需求而不断发展的能力。在整个过程中,您始终可以看到如何深入理解本书中讲解的核心概念,并运用您的开发成果对其进行完善。此外,您还可以从零开始构建一个完整的GenAISys系统,随时可以进行演示,并可根据您正在进行的项目的具体要求进行调整。
We designed a flexible frontend web page to demonstrate the system’s capabilities. The idea is to have an adaptable HTML page that can be modified in real time or rapidly, depending on the situation. The frontend web page translates the IPython interface we built into an accessible, customizable HTML display. Live data updates and real-time simulations reinforce the system’s practical impact. We designed a narrative that stresses the GenAISys’s adaptability and ability to evolve with customer needs. At all times, you saw how you could drill down into the main concepts taught in this book and enhance them with your developments. You also have your own GenAISys built from scratch and ready to present, which can be adapted to the specifications of the project you are working on.
本章强调以人为本的方法,确保人工智能增强而非取代专家决策。该系统定位为一种旨在提升运营效率的协作工具。实时关键绩效指标 (KPI) 可以增强 GenAISys 的盈利能力,并加强 GenAISys 与使用团队之间的联系。诸如向量存储等创新功能增强了数据检索,从而提高了输出质量。我们展示了如何呈现前瞻性愿景,通过演示系统向基于群体智能的多智能体系统 (MAS) 的演进,证明了框架的可扩展性,并引入了人机协作的强大理念。我们展示了与主流云平台和企业资源计划 (ERP) 系统的集成。我们展示了集成 AWS、Google Cloud 和 Microsoft Azure 等服务的可能性,突显了其多功能性。我们通过可能的实时警报和自适应措施,严格解决了安全性和隐私问题。
This chapter favored a human-centric approach, ensuring AI augments rather than replaces expert decision-making. The system is positioned as a collaborative tool designed to boost operational efficiency. Real-time KPIs can reinforce the GenAISys’s profitability and strengthen the bonds between the GenAISys and the teams using it. Innovative features such as vector stores enhance data retrieval for improved outputs. We showed how to present a forward-looking vision by demonstrating a prospective evolution of the system into a swarm-based MAS, proving the framework’s scalability and introducing the powerful narrative of human-AI co-worker collaboration. We presented integration with major cloud platforms and ERP systems. We showed the possibility of integrating services such as AWS, Google Cloud, and Microsoft Azure, highlighting the versatility. Security and privacy were addressed rigorously through possible real-time alerts and adaptive measures.
灵活性仍然是关键,它使系统能够满足各种企业需求。该方法兼顾了前沿技术和切实可行的业务成果。迭代流程鼓励根据客户的持续反馈进行定制。我们举办了互动式研讨会,以便随时调整系统。引入了 RACI 矩阵来明确角色并简化项目管理。最终,本章提供了一个全面的蓝图,用于部署动态的、可立即投入市场的 GenAISys 演示系统。我们兼顾了技术深度和清晰的业务收益。
Flexibility remains key, enabling the system to meet diverse enterprise requirements. The approach balances cutting-edge technology with tangible business outcomes. An iterative process encourages customization based on ongoing client feedback. We presented interactive workshops as a means of tailoring the system on the fly. A RACI matrix was introduced to delineate roles and streamline project management clearly. Ultimately, the chapter provided a comprehensive blueprint for deploying a dynamic, market-ready GenAISys presentation. We balanced technical depth with clear business benefits.
您现在已具备设计、构建和交付 GenAISys 的业务能力!
You are now business-ready to design, build, and deliver a GenAISys!
新框架、演进的架构、研究成果发布、生产故障——AI_Distilled每周都会为从事 LLM 和 GenAI 系统实战的工程师和研究人员提供一份简报,帮您过滤掉各种干扰信息。立即订阅,即可免费获得电子书,以及每周更新的深度见解,助您保持专注,掌握最新资讯。
New frameworks, evolving architectures, research drops, production breakdowns—AI_Distilled filters the noise into a weekly briefing for engineers and researchers working hands-on with LLMs and GenAI systems. Subscribe now and receive a free eBook, along with weekly insights that help you stay focused and informed.
订阅请访问https://packt.link/TRO5B或扫描下方二维码。
Subscribe at https://packt.link/TRO5B or scan the QR code below.
订阅我们的在线数字图书馆,即可畅享超过 7,000 本图书和视频,以及行业领先的工具,助您规划个人发展并提升职业技能。更多信息,请访问我们的网站。
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
一个在www.packt.com上,您还可以阅读一系列免费技术文章,注册各种免费新闻通讯,并获得 Packt 图书和电子书的独家折扣和优惠。
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
如果您喜欢这本书,您可能也会对 Packt 出版的其他书籍感兴趣:
If you enjoyed this book, you may be interested in these other books by Packt:
利用LLM、RAG和知识图谱构建AI代理
Building AI Agents with LLMs, RAG, and Knowledge Graphs
萨尔瓦多·雷埃利、加布里埃尔·尤库拉诺
Salvatore Raieli, Gabriele Iuculano
ISBN:978-1-83508-038-2
ISBN: 978-1-83508-038-2
AI 代理实践
AI Agents in Practice
瓦伦蒂娜·阿尔托
Valentina Alto
ISBN:978-1-80580-134-4
ISBN: 978-1-80580-134-4
如果您有兴趣成为 Packt 的作者,请访问authors.packtpub.com并立即申请。我们已与数千名像您一样的开发者和技术专业人士合作,帮助他们与全球技术社区分享他们的真知灼见。您可以提交通用申请,也可以申请我们正在招募作者的特定热门主题,或者提交您自己的选题。
If you’re interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
现在您已经读完了《构建可用于商业用途的生成式人工智能系统》,我们非常希望听到您的想法!如果您是从亚马逊购买的本书,请点击此处直接前往亚马逊评论页面分享您的反馈,或者在您购买本书的网站上留下评论。
Now you’ve finished Building Business-Ready Generative AI System, we’d love to hear your thoughts! If you purchased the book from Amazon, please click here to go straight to the Amazon review page for this book and share your feedback or leave a review on the site that you purchased it from.
您的评价对我们和科技界都非常重要,它将帮助我们确保提供高质量的内容。
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
一个
A
代理通信协议 (ACP) 384
Agent Communication Protocol (ACP) 384
AI控制器3
AI controller 3
GenAISys memory, building 6, 7
面向生成式人工智能的模型无关方法5
model-agnostic approach, to generative AI 5
面向生成式人工智能模型4 的模型无关方法
model-agnostic approach, to generative AI model 4
RAG核心功能8
RAG core functions 8
AI controller orchestrator 41, 58
GPT 模型,工作61
GPT model, working 61
implementing, for instruction selection 64, 65
场景,运行生成式 AI 代理70
scenario, running with generative AI agent 70
场景,选择66
scenario, selecting 66
AI控制器协调器,场景
AI controller orchestrator, scenario
intent recognition, performing 68, 69
scenario selection, performing 68, 69
task/instruction scenarios, defining 66-68
人工智能功能251
AI functions 251
思路链256
chain of thought 256
fallback handler (memory-based) 261-266
RAG 251
RAG 251
weather forecast function, adding 334, 335
分析处理程序249
analysis handler 249
自动导引车(AGV)373
automated guided vehicle (AGV) 373
B
B
后端工程师 (BE) 12
backend engineer (BE) 12
business-ready GenAISys, components 2, 3
人类角色11
human roles 11
C
C
chain-of-thought (CoT) reasoning 152, 173
in GenAISys, versus traditional software sequences 174, 175
running, from user perspective 183-186
chat(user_message) 函数121
chat(user_message) function 121
清洁功能54
cleanse function 54
云工程师 (CE) 12
cloud engineer (CE) 12
cognitive flow, COT reasoning 175-177
活动建议181
activities suggestion 181
decision tree classifier 178-180
图像分析182
image analysis 182
图像生成181
image generation 181
复杂系统消息设计
complex system message, designing
附加说明206
additional instructions 206
尺寸205
dimensions 205
memory analysis, running 207-209
内存编码标签204
memory encoding tags 204
模型介绍202
model introduction 202
主要任务回忆207
primary task recall 207
目的203
purpose 203
响应格式206
response format 206
情感得分205
sentiment score 205
消费者 GenAISys 内存代理
consumer GenAISys memory agent
设计191
designing 191
GenAISys architecture, enhancing 196, 197
memory structures, defining 193-196
消费者记忆代理
consumer memory agent
复杂系统消息,设计202
complex system message, designing 202
custom message creation 215-217
图像创建215
image creation 215
记忆与情感分析201
memory and sentiment analysis 201
sentiment scores extraction 209-211
情境意识和记忆保持19
contextual awareness and memory retention 19
long-term memory of multiple session 33, 34
long-term memory session of multiple cross-topic sessions 34, 35
short-term memory session 31, 32
无状态、无记忆会话28
stateless and memoryless session 28
对话式人工智能代理41、42、125-129
conversational AI agent 41, 42, 125-129
环境,设置43
environment, setting up 43
长期记忆保持率42
long-term memory retention 42
短期记忆保持率42
short-term memory retention 42
conversational AI agent workflow 43, 57
完整对话循环48
full-turn conversation loop 48
初步对话,从第44页开始
initial conversation, starting 44
long-term memory session 52-56
long-term multi-user memory 56, 57
跑50
running 50
short-term memory session 51, 52
对话式人工智能代理工作流程,全回合对话循环
conversational AI agent workflow, full-turn conversation loop
API 调用,发出49 次
API call, making 49
API响应,附加49
API response, appending 49
助手的回复显示49
assistant’s response, displaying 49
对话记录,退出50
conversation log, exiting 50
对话记录(循环),清理49
conversation log (loop), cleansing 49
对话记录,保存50
conversation log, saving 50
退出条件,检查49
Exit condition, checking 49
会话终止50
session termination 50
用户输入,追加到消息对象49
user input, appending to messages object 49
用户输入提示48
user input prompt 48
对话式人工智能代理工作流程,初始对话
conversational AI agent workflow, initial conversation
API变量,初始化45
API variables, initializing 45
对话循环47
conversation loop 47
首次 API 调用,进行47 次调用
initial API call, making 47
初始 API 响应,附加47
initial API response, appending 47
助理的初步回复,附上47
initial assistant’s response, appending 47
初始对话记录,清理46
initial conversation log, cleansing 46
初始用户输入,处理46
initial user input, handling 46
消息对象,初始化45
messages object, initializing 45
run_conversational_agent 函数45
run_conversational_agent function 45
欢迎信息,打印45
welcome message, printing 45
语言可接受性语料库(CoLA)61
Corpus of Linguistic Acceptability (CoLA) 61
CoT 环80
CoT loops 80
D
D
数据工程师(DE)12
data engineer (DE) 12
数据隐私框架 (DPF) 377
Data Privacy Framework (DPF) 377
数据隐私官 (DPO) 14
data privacy officer (DPO) 14
数据检索(data01)255
data retrieval (data01) 255
数据安全功能312
data security function 312
321号楼
building 321
Pinecone index, populating 321, 322
Pinecone index, querying 323-325
DeepSeek 拥抱脸环境
DeepSeek Hugging Face environment
设置232
setting up 232
DeepSeek-R1-Distill-Llama-8B 228
DeepSeek-R1-Distill-Llama-8B 228
执行第231条
implementing 231
模型,从233下载
model, downloading from 233
DeepSeek-R1 论文227
DeepSeek-R1 paper 227
E
E
情绪记忆193
emotional memory 193
情景查询
episodic query
带有语义暗示29
with semantic undertone 29
事件驱动型 GenAISys 接口
event-driven GenAISys interface
功能163
functionalities 163
IPython interface enhancements 155-160
事件驱动型 GenAISys 接口,环境设置163
event-driven GenAISys interface, environment setup 163
CoT 推理167
CoT reasoning 167
gTTS 165
gTTS 165
事件驱动的 GenAISys 接口流程
event-driven GenAISys interface processes
机器人回复,生成124
bot response, generating 124
chat(user_message) 函数121
chat(user_message) function 121
继续处理123
continue process 123
退出流程122
exit process 122
输入框事件121
input box event 121
modules and libraries, importing 117, 118
UI容器小部件,显示120
UI container widget, displaying 120
update_display function 124, 125
widgets, initializing 118, 119
F
F
备用内存处理程序250
fallback memory handler 250
灵活的 HTML 界面
flexible HTML interface
advanced frameworks, showcasing 380- 382
core GenAISys, presenting 364-367
GenAISys resources (RACI) 390, 391
以人为本的实施370
human-centric implementation 370
real-time KPI, adding to web interface 374, 375
MAS 383、384的战略整合方案
strategic integration options, for MAS 383, 384
vector store, presenting 368-370
全尺寸 GenAISys 18
full-scale GenAISys 18
关键特征18
key characteristics 18
用例示例19
use case examples 19
G
G
建筑,增强196、197、310、311
architecture, enhancing 196, 197, 310, 311
启用80
enabling 80
mobility intelligence, adding 292-294
presentation, designing 356-358
GenAISys,业务机会和范围16
GenAISys, business opportunities and scope 16
全尺寸 GenAISys 18
full-scale GenAISys 18
混合方法17
hybrid approach 17
小规模 GenAISys 17
small-scale GenAISys 17
GenAISys接口217
GenAISys interface 217
AI代理,增强220
AI agent, enhancing 220
capabilities, generalizing 221-223
生成处理程序249
generation handler 249
生成式人工智能模型
generative AI model
执行4
executes 4
编排4
orchestrates 4
Google Cloud Platform (GCP) 92
Google Cloud Platform (GCP) 92
GPT-4o
GPT-4o
嵌入59
embedding 59
相似性搜索59
similarity search 59
任务执行59
task execution 59
GPT模型
GPT model
Corpus of Linguistic Acceptability (CoLA) 61, 62
语义文本相似度基准(STSB)63
Semantic Textual Similarity Benchmark (STSB) 63
总结64
summarization 64
翻译任务63
translation task 63
图神经网络(GNN)271
graph neural networks (GNNs) 271
H
H
处理程序240
handler 240
与 if...then 列表240相比
versus if...then list 240
分析处理程序249
analysis handler 249
备用内存处理程序250
fallback memory handler 250
功能247
features 247
生成处理程序249
generation handler 249
图像处理器249
image handler 249
松果/碎布处理机248
Pinecone/RAG handler 248
推理处理程序248
reasoning handler 248
关键属性结构247
structure of key properties 247
处理程序选择机制227、244-246
handler selection mechanism 227, 244-246
数据安全职能,321号楼
data security function, building 321
文件管理243
file management 243
implementing, as orchestrator 238, 239
互动316
interactions 316
IPython interface modifications 240-242
moderation function, implementing 317-320
security checks, running 325-328
安全功能,增加312
security function, adding 312
人类角色11
human roles 11
GenAISys governance teams 11-13
混合方法17
hybrid approach 17
关键特征17
key characteristics 17
用例示例17
use case example 17
我
I
图像创建处理程序249
image creation handler 249
内隐记忆193
implicit memory 193
指令块272
instruction block 272
指令调谐273
instruction tuning 273
IPython接口
IPython interface
handle_submission() 逻辑296
handle_submission() logic 296
移动性值,在 update_display() 中处理296
mobility value, handling in update_display() 296
instruct_selector 中的选项,创建294、295
option in instruct_selector, creating 294, 295
ipywidgets 117
ipywidgets 117
K
K
K均值聚类(KMC)10
K-means clustering (KMC) 10
k近邻算法(KNN)10
k-nearest neighbors (KNN) 10
L
L
大型语言模型(LLM)269
large language models (LLMs) 269
长期记忆(LTM)191
long-term memory (LTM) 191
多次跨主题会议的长期记忆6
long-term memory of multiple cross-topic session 6
多次会话的长期记忆6
long-term memory of multiple session 6
long-term memory session 33, 34
long-term memory session of multiple cross-topic sessions 34, 35
M
M
机器学习(ML)2
machine learning (ML) 2
mcontent 30
mcontent 30
记忆类别
memory categories
小说记忆195
Fiction Memory 195
长期记忆195
Long-Term Memory 195
无记忆的194
memoryless 194
现实记忆195
Reality Memory 195
短期记忆194
Short-Term Memory 194
时间记忆195
Time Memory 195
无记忆会话28
memoryless session 28
核实30
verifying 30
记忆结构193
memory structures 193
LTM 193
LTM 193
STM 193
STM 193
消息变量,用于 OpenAI 函数
message variables, for OpenAI function
补充背景285
additional context 285
收敛286
converging 286
用户消息285
user message 285
机器学习工程师(MLE)11
ML engineer (MLE) 11
流动性292
mobility 292
移动增强型 GenAISys
mobility-enhanced GenAISys
fire disaster scenario 305, 306
production-delivery verification scenario 300- 305
移动智能
mobility intelligence
添加至 IPython 接口294
adding, to IPython interface 294
人工智能功能298
AI functions 298
处理程序选择机制297
handler selection mechanism 297
与模型无关的方法
model-agnostic approach
模型上下文协议 (MCP) 384
Model Context Protocol (MCP) 384
模型演化
model evolution
平衡,满足项目需求228
balancing, with project needs 228
调节函数312
moderation function 312
实施317
implementing 317
多智能体系统(MAS)380
multi-agent system (MAS) 380
特点381
features 381
strategic integration options 383, 384
工作流程概述382
workflow overview 382
multi-turn GenAISys session 129, 130
conversation, displaying 136-138
conversation, loading 136- 141
conversation, summarizing 138-141
interactive conversation 132-135
多用户会话141
multi-user session 141
对话,显示148
conversation, displaying 148
对话,加载中148
conversation, loading 148
对话,总结了148
conversation, summarizing 148
dialogue without AI conversational agent 146, 147
RAG for episodic memory retrieval 144, 145
N
N
自然语言理解(NLU)1
natural language understanding (NLU) 1
哦
O
on_submit 方法119
on_submit method 119
OpenAI API 密钥初始化
OpenAI API key initialization
OpenAI 功能
OpenAI function
messages, implementing 286-288
发送,消息变量283
sending, with message variables 283
OpenAI资源
OpenAI resources
正在下载23
downloading 23
OpenAI API 密钥初始化24
OpenAI API key initialization 24
OpenWeather API
OpenWeather API
URL 328
URL 328
P
P
松果
Pinecone
松果 API 密钥
Pinecone API key
初始化84
initializing 84
松果控制台
Pinecone console
参考链接370
reference link 370
松果指数
Pinecone index
81号楼
building 81
经典数据,插入99
classical data, upserting 99
数据处理85
data processing 85
数据集,嵌入87
dataset, embedding 87
embedding model, initializing 87, 88
environment, setting up 81, 82
教学场景,与79 个场景相符
instruction scenarios, identifying with 79
instruction scenarios, upserting 95- 99
查询102
querying 102
结果返回104
results, returning 104
向量存储,查询104
vector store, querying 104
松果/碎布处理机248
Pinecone/RAG handler 248
预测函数
prediction function
说明,准备283
instructions, preparing 283
OpenAI 函数283
OpenAI function 283
程序性记忆193
procedural memory 193
产品设计师和生产经理(PDPM)228
product designer and production manager (PDPM) 228
项目经理(PM)11
project manager (PM) 11
概念验证310
proof of concept 310
问
Q
质量保证工程师(QAE)12
quality assurance engineer (QAE) 12
查询函数103
querying functions 103
问题组273
question block 273
拉
R
RACI矩阵390
RACI matrix 390
使用392
using 392
RAG核心功能8
RAG core functions 8
agentic orchestrator of multifunctional capabilities 9, 10
上下文窗口优化9
context window optimization 9
知识检索8
knowledge retrieval 8
RAG 功能251
RAG function 251
数据检索(data01)255
data retrieval (data01) 255
semantic analysis (genaisys) 254, 255
情感分析(genaisys)253
sentiment analysis (genaisys) 253
推理处理程序248
reasoning handler 248
循环神经网络(RNN)271
recurrent neural networks (RNNs) 271
负责、问责、咨询和知情 (RACI) 13
Responsible, Accountable, Consulted, and Informed (RACI) 13
热图角色14
heatmap roles 14
retrieval-augmented generation (RAG) 1, 75
动态检索架构设计76
architecting for dynamic retrieval 76
混合检索80
hybrid retrieval 80
instruction scenarios, identifying in Pinecone index 79-81
scenario-driven task execution 77-79
运行对话代理函数
run_conversational_agent function
参数45
parameters 45
S
S
安全工程师 (SE) 12
security engineer (SE) 12
安全功能
security function
添加到处理程序选择机制312
adding, to handler selection mechanism 312
semantic analysis (genaisys) 254, 255
语义查询29
semantic query 29
语义文本相似度基准(STSB)63
Semantic Textual Similarity Benchmark (STSB) 63
情感分析(genaisys)253
sentiment analysis (genaisys) 253
short-term memory session 6, 31, 32
短期记忆(STM)191
short-term memory (STM) 191
小规模 GenAISys 17
small-scale GenAISys 17
关键特征18
key characteristics 18
用例示例18
use case examples 18
无状态、无记忆的会话6
stateless and memoryless session 6
无状态会话28
stateless session 28
核实30
verifying 30
主题专家 (SME) 302
subject-matter expert (SME) 302
T
T
文本到文本转换转换器 (T5) 60
Text-to-Text Transfer Transformer (T5) 60
轨迹模拟
trajectory simulation
analysis and prediction 288- 292
mobility orchestrator, creating 280-282
输出280
output 280
轨迹模拟与预测框架270
trajectory simulation and prediction framework 270
指令调优,用于领域自适应273
instruction tuning, for domain adaptation 273
大规模出行预测面临的挑战271
large-scale mobility forecasting, challenges 271
missing data, handling 273, 274
关键创新272
pivotal innovations 272
传统模式向LLMs 272迁移
traditional models, migrating to LLMs 272
轨迹预测,重新表述为问答272
trajectory prediction, reformulating as Q&A 272
轨迹模拟和预测功能
trajectory simulation and prediction function
V
V
矢量图库368
vector store 368
W
W
天气预报部分
weather forecast component
处理程序,添加到处理程序注册表333
handle, adding to handler registry 333
OpenWeather environment, setting up 329-331
你并非唯一一个在应对碎片化工具、频繁更新和不明确的最佳实践的人。加入我们不断壮大的专业人士社区,交流那些未被写入文档的真知灼见。
You’re not the only one navigating fragmented tools, constant updates, and unclear best practices. Join a growing community of professionals exchanging insights that don’t make it into documentation.
|
随时关注我们作者的最新动态、讨论和幕后花絮。加入我们的 Discord 服务器:https://packt.link/z8ivB或扫描下方二维码: Stay informed with updates, discussions, and behind-the-scenes insights from our authors. Join our Discord at https://packt.link/z8ivB or scan the QR code below: |
与同行交流,分享想法,探讨现实世界中 GenAI 面临的挑战。请在 Reddit 上关注我们:https: //packt.link/0rExL或扫描下方二维码: Connect with peers, share ideas, and discuss real-world GenAI challenges. Follow us on Reddit at https://packt.link/0rExL or scan the QR code below: |